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Abstract 

In this paper we study stochastic dynamic games with many players; these are a fundamental model 
for a wide range of economic applications. The standard solution concept for such games is Markov 
perfect equilibrium (MPE), but it is well known that MPE computation becomes intractable as the num- 
ber of players increases. We instead consider the notion of stationary equilibrium (SE), where players 
optimize assuming the empirical distribution of others' states remains constant at its long run average. 
We make two main contributions. First, we provide a rigorous justification for using SE. In particular, 
we provide a parsimonious collection of exogenous conditions over model primitives that guarantee ex- 
istence of SE, and ensure that an appropriate approximation property to MPE holds, in a general model 
with possibly unbounded state spaces. Second, we draw a significant connection between the validity of 
SE, and market structure: under the same conditions that imply SE exist and approximates MPE well, 
the market becomes fragmented in the limit of many firms. To illustrate this connection, we study in 
detail a series of dynamic oligopoly examples. These examples show that our conditions enforce a form 
of "decreasing returns to larger states"; this yields fragmented industries in the limit. By contrast, viola- 
tion of these conditions suggests "increasing returns to larger states" and potential market concentration. 
In that sense, our work uses a fully dynamic framework to also contribute to a longstanding issue in 
industrial organization: understanding the determinants of market structure in different industries. 
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1 Introduction 



A common framework to study dynamic economic systems of interacting agents is a stochastic game, as 
pioneered by Shapley (1953). In a stochastic game agents' actions directly affect underlying state variables 
that influence their payoff. The state variables evolve according to a Markov process in discrete time, and 
players maximize their infinite horizon expected discounted payoff. Stochastic games provide a valuable 
general framework for a range of economic settings, including dynamic oligopolies — i.e., models of compe- 
tition among firms over time. In particular, since the introduction of the dynamic oligopoly model of Ericson 
and Pakes (1995), they have been extensively used to study industry dynamics with heterogeneous firms in 
different applied settings (see Doraszelski and Pakes (2007) for a survey of this literature). 

The standard solution concept for stochastic games is Markov perfect equilibrium (MPE) (Fudenberg 
and Tirole 1991), where a player's equilibrium strategy depends on the current state of all players. MPE 
presents two significant obstacles as an analytical tool, particularly as the number of players grows large. 
First is computability: the state space expands in dimension with the number of players, and thus the "curse 
of dimensionality" kicks in, making computation of MPE infeasible in many problems of practical interest. 
Second is plausibility: as the number of players grows large, it becomes increasingly difficult to believe that 
individual players track the exact behavior of the other agents. 

To overcome these difficulties, previous research has considered an asymptotic regime in which the 
number of agents is infinite (Jovanovic and Rosenthal 1988, Hopenhayn 1992). In this case, individuals 
take a simpler view of the world: they postulate that fluctuations in the empirical distribution of other 
players' states have "averaged out" due to a law of large numbers, and thus they optimize holding the state 
distribution of other players fixed. Based on this insight, this approach considers an equilibrium concept 
where agents optimize only with respect to the long run average of the distribution of other players' states; 
Hopenhayn (1992) refers to this concept as stationary equilibrium (SE), and we adopt his terminology. SE 
are much simpler to compute and analyze than MPE, making this a useful approach across a wide range of 
applications. In particular, SE of infinite models have also been extensively used to study industry dynamics 
(see, for example, Luttmer 2007, Melitz 2003, Klette and Kortum 2004, and Hopenhayn and Rogerson 
1993). 

In this paper, we address two significant questions. First, under what conditions is it justifiable to 
use SE as a modeling tool? We provide theoretical foundations for the use of SE. In particular, our main 
results provide a parsimonious collection of exogenous conditions over model primitives that guarantee 
existence of SE, and ensure that an appropriate approximation property holds. These results provide a 
rigorous justification for using SE of infinite models to study stochastic games with a large but finite number 
of players. 

The second question we address relates to a longstanding topic of research in industrial organization: 
when do industries fragment, and when do they concentrate? In a fragmented industry all firms have small 
market shares, with no single firm or group of firms becoming dominant. By contrast, in a concentrated 
industry, few participants that hold a notable market share can exert significant market power. In dynamic 
oligopoly models in particular, this is a challenging question to answer due to the inherent complexity of 
MPE. Our second main contribution is to draw a significant connection between the validity of SE, and 
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market structure: under the same conditions that imply SE exist and an appropriate approximation property 
holds, the market becomes fragmented in the limit of many firms. In particular, we interpret our conditions 
over model primitives as enforcement of a form of "decreasing returns to larger states" for an individual 
firm, that yields fragmented industries in the limit. By contrast, as we discuss, violation of these conditions 
suggests "increasing returns to larger states" and potential market concentration. 
Our main results are described in detail below. 

1. Theoretical foundations for SE: Existence ofSE. We provide natural conditions over model primitives 
that guarantee existence of SE over unbounded state spaces. This is distinct from prior work on SE, 
which typically studies models with compact state spaces. Crucially, considering unbounded state 
spaces allows us to obtain sharp distinctions between increasing and decreasing returns to higher 
states, and the resulting concentration or fragmentation of an industry. 

In addition, even though SE of a given model may exist over any compact state space, it may fail to 
exist over an unbounded state space. The reason is that agents may have incentives to grow unbound- 
edly large and in this case the steady-state distribution is not well defined. Hence, a key aspect of our 
conditions is that they ensure the stability of the stochastic process that describes each agent's state 
evolution, and that the resulting steady-state distribution is well defined. In this way, we guarantee 
the compactness of an appropriately defined "best-response" correspondence. Our conditions also en- 
sure the continuity and convexity of this correspondence, allowing us to use a topological fixed-point 
approach to prove existence. 

2. Theoretical foundations for SE: Approximating MPE. We show that the same conditions over model 
primitives that ensure the existence of SE, imply that SE of infinite models approximate well MPE of 
models with a finite number of players, as the number of agents increases. An important condition 
that is required for this approximation result to hold is that the distribution of players' states in the 
SE under consideration must possess a light-tail, as originally observed in Weintraub et al. (2008) 
for a sequence of finite games, and in Weintraub et al. (2011) for a limiting infinite model like the 
one studied in this paper. In a light-tailed equilibrium, no single agent is "dominant;" without such 
a condition it is not possible for agents' to rationally ignore the state fluctuations of their dominant 
competitors. 

Crucially, the light-tail assumption as used in Weintraub et al. (2008) and Weintraub et al. (201 1) is an 
endogenous condition on the equilibrium outcome. A central contribution of this work is to develop 
exogenous conditions over model primitives that ensure the existence of light-tailed SE. In fact, the 
conditions that guarantee compactness in the existence result ensure that all SE are light-tailed. Thus 
approximation need not be verified separately; verification of our conditions simultaneously guaran- 
tees existence of SE as well as a good approximation to MPE as the number of agents increases. 

3. Market structure in dynamic industries. Our results provide important insights into market structure 
in dynamic industries. The literature on dynamic oligopoly models has largely study individual indus- 
tries in which market outcomes are very sensitive to certain model features and parameters (Doraszel- 
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ski and Pakes 2007). In contrast, our results provide conditions for which we can predict important 
features of the equilibrium market structure for a broad range of parameters and specifications. 

In particular, our conditions over model primitives imply that all SE are light-tailed, and therefore, in 
all SE the industry yields a fragmented market structure and no dominant firms emerge. Moreover, all 
these SE are valid approximations to MPE. While these conditions cannot pin-down the equilibrium 
exactly, they guarantee that in all of them the market structure is fragmented. In that sense, our work 
contributes to the "bounds approach" in the industrial organization literature pioneered by Sutton 
(1991), which aims to identify broad structural properties in industries that would yield a fragmented 
or a concentrated market structure. A novelty of our analysis compared to previous work is that it is 
done in a. fully dynamic framework. 

To illustrate the connection between our theoretical results and market structure in dynamic industries, 
we study in detail a collection of three examples in industrial organization. For each of these examples, 
we demonstrate that our conditions on model primitives that guarantee existence of light-tailed SE can be 
interpreted as enforcing "decreasing returns to higher states." Conversely, our analysis of the examples 
suggests that when these conditions are violated, the resulting models exhibit "increasing returns to higher 
states," and SE are not expected to provide accurate approximations or may not even exist. We note that, as 
emphasized above, unbounded state spaces are necessary to highlight the difference between increasing and 
decreasing returns to higher states. 

The first example we discuss is a quality-ladder dynamic oligopoly model where firms can invest to 
improve a firm-specific state; e.g., a firm might invest in advertising to improve brand awareness, or invest 
in R&D to improve product quality (Pakes and McGuire 1994). Firms' single period profits are determined 
through a monopolistic competition model. Through a limiting construction where the number of firms and 
market size both scale to infinity, we use our conditions to show that light-tailed SE exist and approximate 
MPE asymptotically if the single period profit function exhibits diminishing marginal returns to higher 
quality. 

Next, we discuss a model with positive spillovers between firms (Griliches 1998). Here our conditions 
impose a form of decreasing returns in the spillover effect that, together with the decreasing returns to 
investment condition introduced in the previous model, ensure SE exist and provide good approximations to 
MPE. When the spillover effect is controlled in this way, the market is more likely to fragment. 

Finally, we discuss a dynamic oligopoly that incorporates "learning-by-doing", so that firms become 
more efficient as they gain experience in the marketplace (Fudenberg and Tirole 1983). In this case, we 
find that firms' learning processes must exhibit decreasing returns to scale to ensure existence of light-tailed 
SE. These conditions are consistent with prior observations in the literature that suggest industries with 
prominent learning-by-doing effects will tend to concentrate; our results compactly quantify such intuition. 

Indeed, in all these examples, our results validate intuition by providing quantifiable insight into market 
structure. Industries with increasing returns are typically concentrated and dominated by few firms, so SE 
would not be good approximations. By contrast, our conditions on model primitives delineate a broad range 
of industries with decreasing returns that become fragmented in the limit and for which SE provide accurate 
approximations. 
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The remainder of the paper is organized as follows. Section 2 describes related literature. Section 3 
introduces our stochastic game model, and there we define both MPE and SE. We then preview our results 
and discuss the motivating examples above in detail in Section 4. In Section 5, we develop exogenous 
conditions over model primitives that ensure existence of light-tailed SE. In Section 6, we show that under 
our conditions any light-tailed SE approximates MPE asymptotically. Section 7 revisits the examples in 
light of the theoretical results provided in the two previous sections. We conclude and discuss future research 
directions in Section 8. The appendices contain all mathematical proofs as well as important complementary 
material. 

2 Related Work 

Our work is related to previous literature that studies stationary equilibria or closely-related equilibrium 
concepts. SE is sometimes called mean field equilibrium because of its relationship to mean field mod- 
els in physics, where large systems exhibit macroscopic behavior that is considerably more tractable than 
their microscopic description. (See, e.g., Blume (1993) and Moms (2000) for related ideas applied to 
static games.) In the context of stochastic games, SE and related approaches have been proposed under a 
variety of monikers across economics and engineering; see, e.g., studies of anonymous sequential games 
(Jovanovic and Rosenthal 1988, Bergin and Bernhardt 1995); dynamic stochastic general equilibrium in 
macroeconomic modeling (Stokey et al. 1989); Nash certainty equivalent control (Huang et al. 2006, 2007); 
mean field games (Lasry and Lions 2007); and dynamic user equilibrium (Friesz et al. 1993). SE has also 
been studied in recent works on information percolation models (Duffle et al. 2009), sensitivity analysis in 
aggregate games (Acemoglu and Jensen 2009), coupling of oscillators (Yin et al. 2010), scaling behavior 
of markets (Bodoh-Creed 2011), and in analysis of stochastic games with complementarities (Adlakha and 
Johari 2010). 

Prior work has considered existence of equilibrium in stochastic games in general, but these are typically 
established only in restricted classes such as zero-sum games and games of identical interest; see Mertens 
et al. (1994) for background. Doraszelski and Satterthwaite (2010) and Escobar (2008) show existence 
of MPE for different classes of stochastic games under appropriate concavity assumptions. Our work is 
particularly related to Jovanovic and Rosenthal (1988) and Hopenhayn (1992) that consider existence of SE. 
The former paper considers a model similar to ours but restricts attention to compact sets, while the latter 
paper is focused on a specific model of oligopoly competition. Adlakha and Johari (2010) also consider 
existence of SE; they focus on games with strategic complementarities, and establish existence using a 
constructive approach based on lattice theoretic methods. The preceding three papers study a different 
setting to ours and do not establish an approximation theorem. Several prior papers have considered various 
notions of approximation properties for SE in specific settings, either with bounded state spaces (Glynn 
2004, Tembine et al. 2009, Bodoh-Creed 2011) or with an exogenous compactness assumption (Adlakha 
et al. 2010), or in linear-quadratic payoff models (Huang et al. 2007, Adlakha et al. 2008). 

We briefly discuss here relation to our own prior work. In our previous conference papers (Adlakha 
et al. 2008, 2010), we study SE in a less general model of stochastic games than this paper. Though we 
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study existence of SE and an appropriate approximation property, we make an endogenous assumption of 
compactness; in other words, we assume the model is such that in searching for SE we can restrict attention 
to a compact set. As a result, those results do not relate model primitives to either validity of SE as an 
approximation, nor to market structure. By contrast, in this paper, we derive exogenous conditions on model 
primitives that guarantee compactness, existence of SE, and an appropriate approximation property. In 
addition, as a consequence, we are able to apply our results to derive sharp insight into market structure. 

Our paper is also closely related to Weintraub et al. (2011), who study a class of industry dynamic 
models. They also show a result that depends endogenously on SE: if a given SE satisfies an appropriate 
light-tail condition, then it approximates MPE well as the number of firms grows. Our paper provides 
several important contributions with respect to Weintraub et al. (2011). First, we consider a more general 
stochastic game model that allows us, for example, to study the models with spillovers and learning-by- 
doing. On the other hand, we do not consider entry and exit as they do; we discuss this extension in the 
conclusions section. We also consider a stronger approximation property. Second, and more importantly, 
the light-tail condition used to prove the approximation result in Weintraub et al. (2011) is a condition over 
equilibrium outcomes; by contrast, we provide conditions over model primitives that guarantee all SE are 
light-tailed and hence approximate MPE asymptotically. As a consequence, these conditions also give sharp 
insight into market structure in our paper. Finally, we provide a novel result pertaining to existence of SE, 
particularly over unbounded state spaces. We close by noting that Weintraub et al. (2011) also consider 
an analog of SE called "oblivious equilibrium" (OE) in models with finitely many agents. They study the 
relation between OE and SE by analyzing the hemicontinuity of the OE correspondence at the point where 
number of firms becomes infinite. 

3 Preliminaries and Definitions 

In this section we define our general model of a stochastic game, and proceed to define two equilibrium 
concepts: Markov perfect equilibrium (MPE) and stationary equilibrium (SE). We conclude by defining the 
asymptotic Markov equilibrium property, which requires that SE approximates MPE well as the number of 
players grows large. 

3.1 Stochastic Game Model 

In this section, we describe our stochastic game model. Compared to standard stochastic games in the 
literature (Shapley 1953), in our model, every player has an individual state. Players are coupled through 
their payoffs and state transitions. A stochastic game has the following elements: 

Time. The game is played in discrete time. We index time periods by t = 0, 1, 2, 

Players. There are m players in the game; we use i to denote a particular player. 
State. The state of player i at time t is denoted by x^t € X, where X C Z d is a subset of the d- 
dimensional integer lattice. We use x t to denote the state of all players at time t and X-ij to denote the state 
of all players except player i at time t. For indication of how to proceed with compact but not necessarily 
discrete state spaces, we refer the reader to the recent independent work of Bodoh-Creed (201 1). 
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Action. The action taken by player i at time t is denoted by a^t € A, where A C is a subset of the 
g-dimensional Euclidean space. We use at to denote the action of all players at time t. 

Transition Probabilities. The state of a player evolves in a Markov fashion. Formally, let h t = 
{xq, ao, . . . , Xt-i, o,t-i} denote the history up to time t. Conditional on h t , players' states at time t are 
independent of each other. This assumption implies that random shocks are idiosyncratic, ruling out aggre- 
gate random shocks that are common to all players. The assumption is important to derive our asymptotic 
results. Player i's state x^t at time t depends on the past history h t only through the state of player i at time 
t — 1, Xij-i', the states of other players at time t — 1, X-ij-i; and the action taken by player i at time t — 1, 
a^t-v We represent the distribution of the next state as a transition kernel P, where: 

P(x- | Xi,ai,x_i) = Prob (x iyt +i = x\ \ x ijt = Xi,a i>t = ai,X- i)t = x_i). (1) 

Payoff. In a given time period, if the state of player i is xi, the state of other players is x^i, and the 
action taken by player i is a t , then the single period payoff to player i is 7r(xj, a^, tc_j) € K. 

Discount Factor. The players discount their future payoff by a discount factor < j3 < 1. Thus a player 
i's infinite horizon payoff is given by: Ylt^o /^* 7r ( a; i,t' a M' f— i,t) • 

In a variety of games, coupling between players is independent of the identity of the players. The notion 
of anonymity captures scenarios where the interaction between players is via aggregate information about 
the state (e.g., see Jovanovic and Rosenthal 1988). Let f^™\(y) denote the fraction of players (excluding 
player i) that have their state as y at time t, i.e.: 

^(^^lEWrf. ( 2 ) 

where l^ x . t=y y is the indicator function that the state of player j at time t is y. We refer to f^™\ as the 
population state at time t (from player i's point of view). 

Definition 1 (Anonymous Stochastic Game). A stochastic game is called an anonymous stochastic game 

if the payoff function 7r(xij,aij,x_i t t) and transition kernel Y'{x' it \ Xi t t,cii t t,x_i t t) depend on x^ij 
only through f_^ t . In an abuse of notation, we write -^(x^, a^t, f^™\) fo r the payoff to player i, and 
P(x^ t | Xi t t, ai t t, f^™it)f or the transition kernel for player i. 

For the remainder of the paper, we focus our attention on anonymous stochastic games. For ease of 
notation, we often drop the subscript i and t and denote a generic transition kernel by P(- | x, a, /), and a 
generic payoff function by n(x, a, /), where / represents the population state of players other than the player 
under consideration. Anonymity requires that a firm's single period payoff and transition kernel depend on 
the states of other firms via their empirical distribution over the state space, and not on their specific identify. 
The examples we discuss in the next section satisfy this assumption. Second, in an anonymous stochastic 
game the functional form of the payoff function is the same, regardless of the number of players m. In 
that sense, we often interpret the profit function tt(x, a, f) as representing a limiting regime in which the 
number of agents is infinite. In Section 4 we discuss how to derive this limiting profit function in different 
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applications. Moreover, in Appendix B we briefly discuss how our results can be extended to include the 
case where there is a sequence of payoff functions that depends on the number of agents. 

We introduce some additional useful notation. Let 5 be the set of all possible population states on X: 

={/:*-► [0, 1] | fix) > 0, ]T f(x) = 1}. (3) 

In addition, we let & m > denote the set of all population states in 5 over m — 1 players, i.e.: 

$M = {/ e d : there exists x G X' m ~ x with f(y) = — L_ £ l {xj=v} , Vy € x}. 

j 

3.2 Markov Perfect Equilibrium 

In studying stochastic games, attention is typically focused on Markov strategies, where the action of a 
player at each time is a function of only current state of every player (Fudenberg and Tirole 1991, Maskin 
and Tirole 1988). In the context of anonymous stochastic games, a Markov strategy depends on the current 
state of the player as well as the current population state. Because a player using such a strategy tracks the 
evolution of the other players, we refer to such strategies in our context as cognizant strategies. 

Definition 2. Let DJl be the set of cognizant strategies available to a player. That is, DJl = { il I A 1 : % x -$ ~ * 
A). 

Consider an m-player anonymous stochastic game. At every time t, player % chooses an action a^t that 
depends on its current state and on the current population state f^™\ £ 3^ m ^ Letting /Xj G 9K denote the 
cognizant strategy used by player i, we have a^t = [J>i(xi,t, f^t)- The next state of player % is randomly 
drawn according to the kernel P: 



Xij+l ~ P • 



We let /x( m ) denote the strategy vector where every player has chosen strategy fj,. Define (x, f \ /x', /x^" 1-1 )) 
to be the expected net present value for a player with initial state x, and with initial population state / € 
"$( m \ given that the player follows a strategy // and every other player follows the strategy fx. In particular, 
we have 



E 



(3 n{xi tt ,ai >t ,p_^ t ) | x ifi = xJ^'q = /; /ij = fi , /x_j = /i 



(m-l) 



(5) 



.t=o 

where denotes the strategies employed by every player except i. Note that state sequence x^t and 
population state sequence f^™\ evolve according to the transition dynamics (4). 

We focus our attention on a symmetric Markov perfect equilibrium (MPE), where all players use the 
same cognizant strategy ll. In an abuse of notation, we write y( m ) (x, / | /^ m )) to refer to the expected 
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discounted value as given in equation (5) when every player follows the same cognizant strategy fi. 

Definition 3 (Markov Perfect Equilibrium). The vector of cognizant strategies G 931 is a symmetric 
Markov perfect equilibrium (MPE) if for all initial states x £ X and population states f € ^( m ) we have 
su P//eOT (x,f\ / u / ,^ (m " 1) ) = VW{x,f | 

Thus, a Markov perfect equilibrium is a profile of cognizant strategies that simultaneously maximize 
the expected discounted payoff for every player, given the strategies of other players. 1 It is a well known 
fact that computing a Markov perfect equilibrium for a stochastic game is computationally challenging in 
general (Doraszelski and Pakes 2007). This is because to find an optimal cognizant strategy, each player 
needs to track and forecast the exact evolution of the entire population state. In certain scenarios, it might 
be infeasible to exchange or learn this information at every step because of limited communication capacity 
between players or limited cognitive ability. Moreover, even if this is possible, the computation of an 
optimal cognizant strategy is subject to a curse of dimensionality; the state space $( m ^ grows too quickly 
as the number of agents m and/or the number of individual states X becomes large. As a consequence, 
computing Markov perfect equilibrium in practice is only possible in models with few agents and few 
individual states, severely restricting the set of problems for which this equilibrium concept can be used. 
In the next subsection, we describe a scheme for approximating Markov perfect equilibrium that alleviates 
these difficulties. 

3.3 Stationary Equilibrium 

In a game with a large number of players, we might expect that fluctuations of players' states "average out" 
and hence the actual population state remains roughly constant over time. Because the effect of other players 
on a single player's payoff and transition probabilities is only via the population state, it is intuitive that, as 
the number of players increases, a single player has negligible effect on the outcome of the game. Based on 
this intuition, related schemes for approximating MPE have been proposed in different application domains 
via a solution concept we call stationary equilibrium or SE (see Sections 1 and 2 for references on SE and 
related work). 

We consider a limiting model with an infinite number of agents in which a law of large numbers holds 
exactly. In an SE of this model, each player optimizes its payoff assuming the population state is fixed at 
its long-run average. Thus, rather than keep track of the exact population state, a single player's immediate 
action depends only on his own current state. We call such players oblivious, and refer to their strategies as 
oblivious strategies. (This terminology is due to Weintraub et al. 2008.) Formally, we let 9JTo denote the set 
of (stationary, nonrandomized) oblivious strategies, defined as follows. 

Definition 4. Let SOTo be the set of oblivious strategies available to a player. That is, Wlo = {jU | /i : X — > 
A}. 

Under the assumptions we make later in this paper, it can be shown that for any vector of cognizant strategies of players other 
than i, an optimal cognizant strategy always exists for player i. 
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Given a strategy fi G 50?o> an oblivious player z takes an action aj 4 = ^(xij) at time i; as before, the 
next state of the player is randomly distributed according to the transition kernel P: 



%i,t+i ~ P(- I Xi,t, K x i,t),f) (6) 

Note that because we are considering a limiting model, the player's state evolves according to a transition 
kernel with fixed population state /. The interpretation is that a single player conjectures the population 
state to be /; therefore, in determining a player's future expected payoff stream, it considers a transition 
kernel where its own state evolution is affected by the fixed population state /. 

We define the oblivious value function V(x \ fi, /) to be the expected net present value for any oblivious 
player with initial state x, when the long run average population state is /, and the player uses an oblivious 
strategy /i. We have 



V(x\fi,f) ±E[Y,PH x i,uHt,f) 



t=o 



Xi,o = x; n 



(V) 



Note that the state sequence x^t is determined by the strategy fi according to the dynamics (6), where the 
population state is fixed at /. We define the optimal oblivious value function V*(x \ f) as V*(x | /) = 
su P/iG9Jlo V( x I A 4 '/)- Given a population state /, an oblivious player computes an optimal strategy by 
maximizing its oblivious value function. Note that because an oblivious player does not track the evolution 
of the population state and its state evolution depends only on the population state /, if an optimal stationary 
nonrandomized strategy exists, it will only be a function of the player's current state — i.e., it must be obliv- 
ious even if optimizing over cognizant strategies. We capture this optimization step via the correspondence 
V defined next. 

Definition 5. The correspondence V : J — > 9Jlo maps a distribution f £ $ to the set of optimal oblivious 
strategies for a player. That is, fi € V(f) if and only ifV{x \ fi, /) = V*(x \ f)for all x. 

Note that V maps a distribution to a stationary, nonrandomized oblivious strategy. This is typically 
without loss of generality, since in most models of interest there always exists such an optimal strategy. We 
later establish that under our assumptions V(f) is nonempty. 

Now suppose that the population state is /, and all players are oblivious and play using a stationary 
strategy /i. Because of averaging effects, we expect that if the number of agents is large, then the long run 
population state should in fact be an invariant distribution of the Markov process on X that describes the evo- 
lution of an individual agent, with transition kernel (6). We capture this relationship via the correspondence 
V, defined next. 

Definition 6. The correspondence T> : 9J?o x -S — > $ maps the oblivious strategy n and population state f 
to the set of invariant distributions T>(p,, f) associated with the dynamics (6). 

Note that the image of the correspondence V is empty if the strategy does not result in an invariant 
distribution. We later establish conditions under which T>(fi, /) is nonempty. In addition, while we do not 
impose this restriction a priori, there are many models of interest where V is actually a function; that is, 
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for all /i and / the Markov process associated with the dynamics (6) will be ergodic and admit a unique 
invariant distribution. 

We can now define stationary equilibrium. If every agent conjectures that / is the long run population 
state, then every agent would prefer to play an optimal oblivious strategy fj,. On the other hand, if every 
agent plays \i and the population state is in fact /, then we should expect the long run population state of 
all players to be an invariant distribution of (6). Stationary equilibrium requires a consistency condition: the 
equilibrium population state / must in fact be an invariant distribution of the dynamics (6) under the strategy 
H and the same population state /. 

Definition 7 (Stationary Equilibrium). An oblivious strategy p, G Wlo and a distribution / G # constitute a 
stationary equilibrium (SE) if p G V(f) and f 6 T>(/j,, /). 

In the event that the Markov chain induced by \i and / has multiple invariant distributions, the agents 
must all conjecture the population state in equilibrium to be /. Further, in the event that there exist multiple 
optimal strategies given /, the agents must all choose to play fi. In many models of interest (such as the 
examples presented in Section 4), both V and V are singletons, so such problems do not arise. For later 
reference, we define the correspondence <3? : # — >■ J as follows: 

$(f) = V(V(f)J). (8) 

Observe that with this definition, a pair (//, /) is an SE if and only if f is a fixed point of / G $(/), such 
that p G V(f) and f G V(p, f) 

3.4 Approximation 

A central goal of this paper is to determine conditions under which SE provides a good approximation to 
MPE as the number of players grows large. Here we formalize the approximation property of interest, 
referred to as the asymptotic Markov equilibrium (AME) property. Intuitively, this property requires that a 
stationary equilibrium strategy is approximately optimal even when compared against Markov strategies, as 
the number of players grows large. 

Definition 8 (Asymptotic Markov Equilibrium). A stationary equilibrium (//, /) possesses the asymptotic 
Markov equilibrium (AME) property if for all states x and sequences of cognizant strategies fi m G we 
have: 

limsup V^(x,f^ | / u m ,/^ (m " 1) ) " V^(x,f^ | v [m) ) < 0, (9) 

almost surely, where the initial population state /( m ) is derived by sampling each other player's initial state 
independently from the probability mass function f. 

Note that (x, \ //, z^™" 1 )) is the actual value function of a player as defined in equation (5), 
when the player uses a cognizant strategy // and every other player plays an oblivious strategy fj,. Similarly, 
(x, | /z( m )) is the actual value function of a player as defined in equation (5) when every player 
is playing the oblivious strategy fj,. AME requires that the error when using the SE strategy approaches 
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zero almost surely with respect to the randomness in the initial population state. Hence, AME requires that 
the SE strategy becomes approximately optimal as the number of agents grows, with respect to population 
states that have nonzero probability of occurrence when sampling individual states according to the invariant 
distribution. 2 This definition can be shown to be stronger than the definition considered by Weintraub et al. 
(2008), where AME is defined only in expectation with respect to randomness in the initial population state. 

3.5 Extensions to the Basic Model 

We briefly mention two extensions for which all our results follow. These extensions are often important in 
applications, but do not require any significant technical arguments. See Appendix A for further details. 

First, note that players are ex-ante homogeneous in the model considered, in the sense that they share the 
same model primitives. This is not a particularly consequential choice, and is made primarily for notational 
convenience; indeed, by an appropriate redefinition of state we can model agent heterogeneity via types. 

Second, note that in the game defined here, players are coupled through their states: both the transition 
kernel and the payoff depend on the current state of all players. However, in many models of interest the 
transition kernel and payoff of a player may depend on both the current state and current actions of other 
players. In particular, the example in Section 4.3 is a model where players are coupled through their actions. 
All the results of this paper naturally extend to a setting where players may also be coupled through their 
actions, i.e., where the transition kernel and payoff may depend on the current actions of all players as well. 
In the context of this paper, when players are coupled through actions, for technical simplicity we focus on 
finite action spaces. In this setting, to ensure existence of equilibrium, we assume that players maximize 
payoffs with respect to randomized strategies. In addition, we briefly discuss how our results could be 
extended to include continuous action spaces as well (see Appendix A). 

4 Preview of Results and Motivating Examples 

As discussed in the Introduction, this paper makes two complementary contributions. On one hand, we es- 
tablish sufficient conditions over model primitives that provide justification for use of SE (in particular, that 
guarantee existence of SE and the AME property). On the other hand, we demonstrate that our conditions 
encode an economic dichotomy, broadly, between "increasing" and "decreasing" returns to higher states; 
the latter corresponds to those models where the industry becomes fragmented in the limit and SE is an 
appropriate modeling tool. In this way, our conditions directly provide insight into market structure. 

This section is devoted to introducing examples drawn from industrial organization that motivate and 
illustrate our results. Each example presents the same basic difficulty: in terms of the parameters of the 
model, where does the boundary lie between those markets where fragmentation might arise, and those 
markets where concentration might be expected? As suggested by the preceding discussion, we use SE as 
a tool to inform this market structure question. In each example, we discuss how our technical results yield 

2 As noted earlier, under the assumptions we make an optimal cognizant strategy can be shown to exist, for any vector of 
cognizant strategies of the opponents. Therefore the AME property can be equivalently stated as the requirement that for all x: 

finwoo (sup Mm£OT V {m) (as, / (m) | ^ m ,fi {m - 1) ) - V (m) (as, / (m) | /i (m) )) = 0, almost surely. 
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sharp conditions under which SE exist, the AME property holds, and all SE yield market fragmentation. We 
also discuss how failure of the conditions would suggest market concentration. 

To set the stage, we first briefly preview the approach behind our main technical results (see Section 5 
and 6). The mathematical complexity in our analysis arises due to unbounded state spaces; these are essential 
if we hope to identify a boundary between fragmentation and concentration in the limit of many firms. 
Unfortunately, with unbounded state spaces, both existence of SE and the AME property may become 
difficult to establish. Informally, this is because mass in the population state may "escape" to larger states 
as the number of firms grows; alternatively, firms may choose strategies that lead to unbounded steady state 
distributions over the state space. 

The key condition we require to overcome these hurdles is to ensure that SE have light tails, i.e., limited 
mass at larger states (in a sense we make precise later). We develop exogenous conditions over model 
primitives that ensure all SE population states have light tails, and we further show that all light-tailed SE 
satisfy the AME property (extending a prior result of Weintraub et al. (2011)). Light tails ensure that no 
single dominant agent emerges in the limit of many firms. Note that in market structure terms, this is exactly 
ma rketf ragmen ta tion. 

Interpretation of our exogenous conditions reveals exactly the dichotomy introduced above: the con- 
ditions enforce a form of "decreasing returns to higher states" in the optimization problem faced by an 
individual agent, while their failure corresponds roughly to "increasing returns." Notably, all our results 
in the examples are simply applications of the same theoretical architecture. As we point out, when the 
examples below violate the assumptions we require — in particular, in models that exhibit increasing returns 
to higher states — we also expect that SE will not satisfy the AME property, and indeed, may not exist. Thus 
despite the fact that we only discuss sufficient conditions for existence and approximation in this paper, 
the examples suggest that perhaps these sufficient conditions identify a reasonable boundary between those 
models that admit analysis via SE, and those that do not. 

For the rest of this section, we consider stochastic games with m players in which the state of a player 
takes values on Z + . 

4.1 Dynamic Oligopoly Models 

Dynamic oligopoly models have received significant attention in the recent industrial organization literature 
(see Doraszelski and Pakes 2007 for a survey). In these models, firms' states correspond to some variable 
that affects profitability; for example, the state could represent the firm's product quality, its current produc- 
tivity level, or its capacity. Per period profits are based on a static competition game, with heterogeneity 
among firms determined by their respective quality levels. Firms take actions to improve their quality; in 
the absence of this investment quality degrades over time. 

Such models are extremely broad and capture a wide range of dynamic phenomena in industrial orga- 
nization. In this context, we address the following important question: under what conditions on the model 
primitives do we obtain concentration of the market, and under what conditions do we obtain fragmentation? 
Intuitively, we might expect that firms need to exhibit decreasing returns to their investments to obtain frag- 
mentation. Our technical results yield a simple condition on model primitives that formalizes this intuition: 
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we require that the single stage profit function exhibits decreasing returns to firm quality. In this case SE 
exist, the AME property holds, and the market structure is fragmented in the limit. 
We now describe our specific model and our result in more detail. 

States. For concreteness, here we consider the quality ladder model of Pakes and McGuire (1994), where 
the state x^t G ^+ represents the quality of the product produced by firm i at time t. 

Actions. Investments improve the state variable over time. At each time t, firm i invests aj t G [0,a] 
to improve the quality of its product. The action changes the state of the firm in a stochastic fashion as 
described below. 

Payoffs. We consider a payoff function derived from price competition under a classic logit demand 
system. In such a model, there are n consumers in the market. In period t, consumer j receives utility Uij t 
from consuming the good produced by firm i given by: U{j t = Q\ ln(xu + 1) + 62 ln(Y — pa) + Uijt, where 
61,62 > 0,Y is the consumer's income, and pu is the price of the good produced by firm i. Here Vij t are 
i.i.d. Gumbel random variables that represent unobserved characteristics for each consumer-good pair. 

We assume that there are m firms that set prices in the spot market. For a constant marginal production 
cost c, there is a unique Nash equilibrium in pure strategies of the pricing game, denoted p* (Caplin and 
Nalebuff 1991). For our limit profit function, we consider an asymptotic regime in which the market size n 
and the number of firms m grow to infinity at the same rate. The limiting profit function corresponds to a 
logit model of monopolistic competition (Besanko et al. 1990) and is given by ir(x, a, /) = ^ C f(y)(y+i) e i ~ 
da, where c is a constant that depends on the limit equilibrium price, c, 62, and Y. Here the second term is 
the cost of investment, where d > is the marginal cost per unit investment. 

Transition dynamics. We use dynamics similar to those in Pakes and McGuire (1994) that have been 
widely used in dynamic oligopoly models. Compared to that paper, we assume random shocks are idiosyn- 
cratic. At each time period, a firm's investment of a is successful with probability 1 °^ n for some a > 0, 
in which case the quality level of its product increases by one level. The parameter a represents the ef- 
fectiveness of the investment. The firm's product depreciates one quality level with probability 5 G (0, 1) 
independently at each time period. Thus a firm's state decreases by one with probability 1+ ^ Qa ; it increases 
by one with probability ^"^a" and stays at the same level with probability 1 ~'^" a . 

Discussion. Our main result for this model is the following proposition. The proof can be found in 
Section 7.1. 

Proposition 1. Suppose that 6\ < 1. Then there exists an SE for the dynamic oligopoly model, and all SE 
possess the AME property. 

The preceding result has a natural interpretation in terms of increasing and decreasing returns to higher 
states. Recall that 6\ represents how much consumers value the quality of the products, and hence if 6\ < 1, 
firms have strictly decreasing marginal returns in their payoff from increasing their own state. This implies 
that as their state grows, firms have less incentives to invest in improving their own state and ensures that, in 
equilibrium, the distribution of firms over the state space has a light tail and, therefore, the market structure 
becomes fragmented in the limit. On the other hand, if 6\ > 1, then firms have an increasing marginal gain 
in their payoff from increasing their own state. Because the marginal cost of investment is constant, firms 
may continue to invest large amounts to improve their state even at very large states. Thus, a single firm 
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optimization problem may not even induce a stable Markov process, and hence an SE may not exist (and the 
AME property may fail). 

This result matches our intuition for exactly those regimes where SE work well as approximations to 
equilibria in finite models. In industries with decreasing returns, we expect to see a fragmented structure in 
the limit. By contrast, in industries with increasing returns, market concentration would likely result in the 
limit, i.e., a few firms capture most of the demand in the market. This is precisely where the AME property 
ceases to hold. 

4.2 Dynamic Oligopoly Models with Positive Spillovers 

In this section, we extend the previous model to account for positive spillovers, or externalities, across 
firms. Spillovers are commonly observed in industry data and could arise, for example, due to laggard firms 
imitating leaders' R&D activities (Griliches 1998). The main difference from the preceding model is that 
now transition dynamics are coupled among the firms: one firm's state is more likely to increase if other 
firms are at higher quality levels. 

Again, we are led to consider the effect of spillovers on market structure. From a technical standpoint, 
the main complexity is that firms' best responses may lead to unbounded distributions over the state space, 
due to the spillover effect. Thus, in order to ensure existence of SE and the AME property, we need a 
condition that controls the spillover effect: intuitively, if the spillover effect is not "too strong", then the 
dynamics will effectively exhibit decreasing returns. Our results quantify this sufficient condition. As 
before, in this case, market fragmentation is obtained in the limit of many firms. 

To introduce spillovers, we consider a formal model in which the state space, action space, and payoff 
are identical to the previous section, and we continue to use the same notation. However, we modify the 
transition kernel to include spillovers, as described below. 

Transition dynamics. We follow the model of Xu (2008), in which transition dynamics depend not only 
on the action of the firm, but also on the state of its competitors. Formally, let s^™ t be the spillover effect 
of the population state on player i at time t, where: s_^ t = Yly&x f-i\(y)hi,t(y)- ^ ere h*,t{y) * s a weight 
function that distinguishes the effect of different states. For this example, we use h^t{y) = (,{y)l{ y>Xi t y for 
some uniformly bounded function C(y). I n this case, a firm is affected with spillovers only from firms that 
have a better state than its own, which seems natural. We define the effective investment of player i at time 
t by: dij + is_) t = e^t. The constant 7 is a spillover coefficient and it captures the effect of industry state 
on the state transition. A higher value of 7 means a higher spillover effect. With an effective investment 
of e, similar to Section 4.1, a firm's state increases by one level with probability 1 "^ g . Finally, as before, 
the firm's product depreciates in quality by one level with probability 5 € (0, 1) independently at each time 
period. 

Discussion. Since the kernel now depends on the population state / through the spillover effect, even 
if 9i < 1, the population state of an agent may grow due to large competitor states. This may lead to a 
scenario where the image of is unbounded, because firms may exhibit unbounded growth. The following 
proposition provides a simple condition for existence of SE. The proof can be found in Section 7.2. 
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Proposition 2. Suppose that 9\ < 1, and: 



(1 - S)a sup^ C(y) 

TTierc ?/iere en'sta aw SE for the dynamic oligopoly model with spillovers, and all SE possess the AME 
property. 

Condition (10) admits a simple interpretation. This condition enforces a form of decreasing returns in 
the spillover effect. If the spillover effect is too large relative to depreciation — i.e., if (10) fails — then the 
state of a given firm has positive drift whenever other firms have large states; and in this case we expect that, 
for some /, the spillover effect can lead to optimal oblivious strategies that yield unbounded growth. On the 
other hand, when (10) holds, then this effect is controlled, and despite the presence of positive spillovers the 
state distribution has a light tail in equilibrium and the industry becomes fragmented in the limit. 

What happens when the sufficient condition fails? We present one informal scenario that suggests market 
concentration may result. Observe that it is plausible that if the condition fails, few firms will have enough 
incentives to grow large to obtain a competitive advantage. Moreover, it is also plausible that a significant 
fraction of "fringe" firms will remain small to free-ride on the "dominant" firms. In this sense, when our 
condition is violated, a dramatically different market structure might be expected. 

4.3 Learning-By-Doing 

Another example that commonly arises in oligopoly settings is learning-by-doing, where firms become more 
efficient by producing goods. In a learning-by-doing model, the state of the firm represents its experience 
level; this grows in response to production, and otherwise depreciates over time. 

In this type of model, it is clear that we require a dichotomy between "increasing" and "decreasing" 
returns to experience. Firms have to produce even in the absence of learning, simply to earn profits in each 
period. Note that if experience levels continue to grow without bound, then it will be impossible to ensure 
SE are light tailed. We show this is in fact sufficient: as long as experience begins to depreciate at sufficiently 
large states (in a sense we make precise), then SE exist, the market becomes fragmented in the limit, and the 
AME property holds. 

We now describe our model; the variant we study is inspired by Fudenberg and Tirole (1983). 

States. We let the state represent the cumulative experience level of a firm at time t; this represents 
the knowledge accumulated through past production. 

Actions. The action a^t represents the firm's output (i.e., goods produced) at time t. We consider a 
model in which firms compete on quantity; thus firms are coupled to each other through their actions. As 
discussed in Section 3.5, such an extension can be accommodated within our framework by restricting pure 
actions to lie on a finite subset S = {0, 1, ... , s m ax} of the integers. 3 

3 This amounts to discretizing the action space of production quantities. In this case, we allow for mixed strategies to ensure 
existence of SE (see Proposition 5). However, note that in many models of interest, under the appropriate concavity assumptions, 
this is not very restrictive as firms will mix between two adjacent pure actions in equilibrium. 
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Payoffs. At each time period, firms produce goods and compete in a market with n consumers. Let 
Pn(-) > be the inverse demand function for a market size of n. For state x, pure action s, and population 
state-action profile /, we can write the payoff function as n n (x, s, f, m) = sP n (^s+(m— 1) J2 X ' s' s 'f( x '> s ') 
C(x,s), where the argument of P n is the aggregate output (from m firms) in the market. Note that / 
is a distribution over state-action pairs. Here, C(x, s) denotes the cost of producing quantity s when the 
firm's experience level is x. We assume that C is nonnegative, decreasing, and convex in x; is increas- 
ing and convex in s; and has decreasing differences between x and s. Consider a limiting case where 
both the number of firms m and the market size n become large at the same rate. We assume that there 
exists a limiting decreasing continuous demand function P such that the limit profit function is given by 
7r(x, s, /) = sP (^Z x i s i s'f(x', s')^ — C(x, s). Note that the limiting case represents perfect competition 
as firms become price takers. 

Transition dynamics. A firm's cumulative experience is improved as it produces more goods since it 
learns from the production process. On the other hand, experience capital depreciates over time due to 
"organizational forgetting." We assume that a firm's experience evolves independent of the experience level 
or the output of other firms in the market. For concreteness, we assume the transition dynamics are the same 
as those described in Section 4. 1 . 

Discussion. Let lim x _ 5 . 00 C(x, s) = C(s), that is, C(s) is the cost of producing quantity s for a firm 
with infinite experience. Our main result for this model is the following proposition. The proof can be found 
in Section 7.3. 

Proposition 3. Let s* be the production level that maximizes sP(0) — C_(s). Suppose that for all sufficiently 
large x and all actions s € [0, s*], we have Y2 X ' x'P(x'\x, s) < x; i.e., the state has negative drift at all 
such pairs (x, s). Then there exists an SE for the learning-by-doing model, and all SE possess the AME 
property. 

Observe that sp — C(x, s) is the single period profit to a firm when the market price is p, the firm 
produces quantity s, and its experience level is x. Generally speaking, because of learning, firms at low 
experience levels face strong incentives to increase their experience, leading them to produce beyond the 
single period optimal quantity. However, for firms at high experience levels, the choice of optimal quantity 
is driven primarily by maximization of single period profit (because C(x,s) is decreasing and convex in 
x). The quantity s* is an upper bound on the maximizer of single period profits, so the drift condition in 
the proposition ensures that at high experience levels, firms' maximization of single period profit does not 
continue to yield unbounded growth in the experience level. 4 

The condition requires that the transition kernel must exhibit sufficiently strong decreasing returns to 
scale; as long as the possible productivity gains induced by learning-by-doing are reduced at larger states, 
light-tailed SE will exist and the market becomes fragmented in the limit. However, if there are not di- 
minishing returns to learning-by-doing, then a firm's experience level will grow without bound and hence a 

4 For example, consider C(x,s) — s/x. Then s* is the largest allowable pure action, hence, the condition requires that all 
actions have negative drift for sufficiently large experience levels. For a less restrictive case, consider C(x, s) — s 2 /x + s 2 /c. 
Then, s* = cP(0)/2, so the condition requires that all actions less than or equal to cP(0)/2 eventually have negative drift. 
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light-tailed SE may not exist. This is consistent with prior observations: an industry for which learning-by- 
doing is prevalent may naturally become concentrated over time (Dasgupta and Stiglitz 1988). 

5 Theory: Existence 

In this section, we study the existence of light-tailed stationary equilibria. We recall that (//, /) is a station- 
ary equilibrium if and only if / is a fixed point of <&(/) = V(V(f), /), such that \i £ V(f) and / £ /). 
Thus our approach is to find conditions under which the correspondence $ has a fixed point; in particular, 
we aim to apply Kakutani's fixed point theorem to $ to find an SE. 

Kakutani's fixed point theorem requires three essential pieces: (1) compactness of the range of <£; (2) 
convexity of both the domain of <I>, as well as <£(/) f° r eacn /; and (3) appropriate continuity properties 
of the operator <I>. It is clear, therefore, that our analysis requires topologies on both the set of possible 
strategies and the set of population states. For the set of oblivious strategies Wlo, we use the topology of 
pointwise convergence. 

For the set of population states, we recall that a key concept in our analysis is that of "light-tailed" 
population states. To formalize this notion, for the set of population states we consider a topology induced by 
the 1-p norm. Given p > 0, the 1-p-norm of a function / : X — >■ Ris given by H/H^p = YlxeX \\ x \\p 
where ||x|| is the usual p-norm of a vector. Let $ p be the set of all possible population states on X with finite 
1-p norm, i.e., $ p = {/ G 5 : < oo}. The requirement / £ $ p imposes a light-tail condition over 

the population state /. The exponent p controls the weight in the tail of the population state: distributions 
with finite 1-p-norms for larger p have lighter tails. The condition essentially requires that larger states must 
have a small probability of occurrence under /. As we discussed in the context of our examples, light-tailed 
SE imply that the market structure becomes fragmented in the limit of a large number of firms. 

We start with the following restatement of Kakutani's theorem. 

Theorem 1 (Kakutani-Fan-Glicksberg). Suppose there exists a set £ C gp such that (1) <L is convex and 
compact (in the 1-p norm), with C £; (2) $(/) is convex and nonempty for every f € £; and (3) <I> 

has a closed graph on £. 5 Then there exists a stationary equilibrium (fi, f) with f £ £. 

In the remainder of this section, we find exogenous conditions on model primitives to ensure these 
requirements are met. We tackle them in reverse order. We first show that under an appropriate continuity 
condition, $ has a closed graph. Next, we study conditions under which <£(/) can be guaranteed to be 
convex. Finally, we provide conditions on model primitives under which there exists a compact, convex set 
£ with <3?(30 C £. The conditions we provide suffice to guarantee that is nonempty for all / £ S r . 
Taken together our conditions ensure existence of SE, as well as an additional stronger characterization: all 
SE are light-tailed, i.e., they have finite 1-p norm. This fact will allow us to show that every SE satisfies the 
AME property in the next section. 

5< 3? has a closed graph if the set {(/, g) : g £ $(/)} C 5p x dp is closed (where J p is endowed with the 1-p norm). 
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5.1 Closed Graph 

In this section we develop conditions to ensure the model is appropriately "continuous." Before stating 
the desired assumption, we introduce one more piece of notation. Without loss of generality, we can view 
the state Markov process in terms of the increments from the current state. In particular, we can write 
= Xij + £i,t, where ^ is a random increment distributed according to the probability mass function 
Q(- | x, a, f) defined by Q(z' | x, a, f) = P(x + z' \ x, a, /). Note that Q(z' | x, a, f) is positive for only 
those z' such that x + z' G X. We make the following assumptions over model primitives. 

Assumption 1 (Continuity). 1. Compact action set. The set of feasible actions for a player, denoted by 
A, is compact. 

2. Bounded increments. There exists M > such that, for all z with ||z||oo > M, Q(z \ x, a, /) = 0, 
for all x G X, a G A and f G 5- 

3. Growth rate bound. There exist constants K and n G Z + such that sup ag _4 j 6 j |vr(x, a, /)| < K(l + 
||x|| 00 ) n /or every x G X, where W'W^ is the sup norm. 

4. Payoff and kernel continuity. For each fixed x,x' G X and f G & the payoff tt(x, a, f) and the kernel 
P(x' | x, a, f) are continuous in a € A. 

In addition, for each fixed x, x' G X, the payoff ir(x, a, f) and the kernel P(x' | x, a, f) are jointly 
continuous in a G A and f G $ p (where 3p is endowed with the 1-p norm). 6 

The assumptions are fairly mild and are satisfied in a variety of models of interest. For example, all 
models in Section 4 satisfy it. The first assumption is standard. We also place a finite (but possibly large) 
bound on how much an agent's state can change in one period (Assumption 1.2), an assumption that is 
reasonably weak. The polynomial growth rate bound on the payoff is quite weak, and serves to exclude the 
possibility of strategies that yield infinite expected discounted payoff. 

Finally, Assumption 1.4 ensures that the impact of action on payoff and transitions is continuous. It also 
imposes that the payoff function and transition kernel are "smooth" functions of the population state under 
an appropriate norm. We note that when X is finite, then H/H^ induces the same topology as the standard 
Euclidean norm. However, when X is infinite, the 1-p-norm weights larger states higher than smaller states. 
In many applications, other players at larger states have a greater impact on the payoff; in such settings, 
continuity of the payoff in / in the 1-p-norm naturally controls for this effect. Given a particular model, 
the exponent p should be chosen to ensure continuity of the payoff and transition kernel. 7 The following 
proposition establishes that the continuity assumptions embodied in Assumption 1 suffice to ensure that $ 
has a closed graph. 

Proposition 4. Suppose that Assumption 1 holds. Then has a closed graph on $ p . 

6 Here we view P(a/ | x, a, f) as a real valued function of a and /, for fixed x, x'; note that since we have also assumed bounded 
increments, this notion of continuity is equivalent to assuming that P(- \ x, a, f) is jointly continuous in a and /, for fixed x, with 
respect to the topology of weak convergence on distributions over X. 

7 See Section 4 and Section 7 for concrete examples. For example, in subsection 4.1 the payoff function depends on the distri- 
bution / via its 6*i moment so it is natural to endow the set of distributions with the 1-p norm withp = 8\. 
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5.2 Convexity 

Next, we develop conditions to ensure that «£(/) is convex. We first provide a result for mixed strategies and 
then a result for pure strategies. 

5.2.1 Mixed Strategies 

We start by considering a simple model, where the action set A is the simplex of randomized actions on 
a base set of finite pure actions. This setting is particularly useful when we assume players are coupled 
through actions (see Section 3.5). Formally, we have the following definition. 

Definition 9. An anonymous stochastic game has a finite action space if there exists a finite set S such that 
the following three conditions hold: 

1. A consists of all probability distributions over S: A = {a > : ^ s a(s) = 1}. 

2. 7r(x, a, f) = a(s)ir(x, s, f), where ir(x, s, f) is the payoff evaluated at state x, population state 
f, and pure action s. 

3. P(x' | x, a, f) = a(s)P(x' | x, s, f), where P(x' | x, s, f) is the kernel evaluated at states x' and 
x, population state f, and pure action s. 

Essentially, the preceding definition allows inclusion of randomized strategies in our search for SE. This 
model inherits Nash's original approach to establishing existence of an equilibrium for static games, where 
randomization induces convexity on the strategy space. We show next that in any game with finite action 
spaces, the set <&(/) is always convex. 

Proposition 5. Suppose Assumption 1 holds. In any anonymous stochastic game with a finite action space, 
$(/) is convex for all f € 

The preceding result ensures that if randomization is allowed over a set of finite actions, then the map 
is convex-valued. We conclude by noting that another simplification is possible when working with a 
finite action space. In particular, it is straightforward to show that if Assumption 1 holds for the payoff and 
transition kernel over all pure actions, then it also holds for the payoff and transition kernel over all mixed 
actions; Proposition 4 follows similarly. The proof follows in an easy manner using the linearity of the 
payoff and transition kernel. This is a valuable insight, since in applications it simplifies the complexity of 
checking the model assumptions necessary to guarantee existence of an equilibrium. We discuss a similar 
point in Section 5.3. 

5.2.2 Pure Strategies 

In contrast to the preceding section, many relevant applications typically require existence of equilibria in 
pure strategies. For such examples, we employ an approach based on the following proposition. 

Proposition 6. Suppose that V{f) is a singleton for all f £ Then <&(/) is convex for all f G & 
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The proof is straightforward: T>(p,, f) is convex-valued for any fixed fi and /, since the set of invariant 
distributions for the kernel defined by \i and / are identified by a collection of linear equations. Thus if 
V(f) is a singleton, then $»(/) = T>(V(f), /) will be convex. 

We now provide two different assumptions over model primitives that guarantee that V(f) is a singleton, 
for all / € 5- The first assumption is a condition introduced by Doraszelski and Satterthwaite (2010) and is 
described in detail there. The assumption has found wide application in dynamic oligopoly models. 

Assumption 2. 1. The state space is scalar, i.e., X C Z+, and the action space A is a compact interval 
of the real numbers. 

2. The payoff ir(x, a, f) is strictly decreasing and concave in afar fixed x and f. 

3. For all f G the transition kernel P is unique investment choice (UIC) admissible: there exist 
functions gi, g 2 , g-s such that P(x' \ x,a,f) = g^x, a, f)g 2 (x',x, /) + g 3 (x',x,f), \/x',x,a,f, 
where gi(x, a, f) is strictly increasing and strictly concave in a. 

The preceding conditions ensure that for all population states / and initial states x, and all continuation 
value functions, the maximization problem in the right hand side of Bellman's equation (cf. (16) in the 
Appendix) is strictly concave, or that the unique maximizer is a corner solution. 

The previous assumption requires a single-dimensional state space and action space. Our next assump- 
tion imposes a different set of conditions over the payoff and the transition kernel, and allows for multi- 
dimensional state and action spaces. Before providing our second condition, we require some additional 
terminology. Let S C W 1 . We say that a function g : S — > E is nondecreasing if g{x') > g(x) whenever 
x' > x (where we write x' > x if x' is at least as large as x in every component). We say g is strictly 
increasing if the inequality is strict. Let Pg be a family of probability distributions on X indexed by 9 G S. 
Given a nondecreasing function u : X — s> M, define Eg[u] = ^2 x u(x)Pe(x). We say that Pg is stochasti- 
cally nondecreasing in the parameter 9, if Eg[n] is nondecreasing in 9 for every nondecreasing function u. 
Similarly, we say that Pg is stochastically concave in the parameter 9 if Kg [u] is a concave function of 9 for 
every nondecreasing function u. We say that Pg is strictly stochastically concave if, in addition, E,g[u] is 
strictly concave for every strictly increasing function u. We have the following assumption. 

Assumption 3. 1. The action set A is convex. 

2. The payoff it (x, a, f) is strictly increasing in x for fixed a and f, and the kernel P(- | x,a,f) is 
stochastically nondecreasing in x for fixed a and f. 

3. The payoff is concave in a for fixed x and f, and the kernel is stochastically concave in a for fixed x 
and f, with at least one of the two strictly concave in a. 

The following result shows the preceding conditions on model primitives ensure the optimal oblivious 
strategy is unique. 

Proposition 7. Suppose Assumption 1 holds, and that at least one of Assumptions 2 or 3 holds. Then V{f) 
is a singleton, and thus <&(/) is convex for all / € J. 
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5.3 Compactness 

In this section, we provide conditions under which we can guarantee the existence of a compact, convex, 
nonempty set £ such that $»(39 C £. The assumptions we make are closely related to those needed to 
ensure that «£(/) is nonempty. To see the relationship between these results, observe that in Lemma 2 in 
the Appendix, we show that under Assumption 1 an optimal oblivious strategy always exists for any / G 
Thus to ensure that $(/) is nonempty, it suffices to show that there exists at least one strategy that possesses 
an invariant distribution. Our approach to demonstrating existence of an invariant distribution is based 
on the Foster-Lyapunov criterion Meyn and Tweedie (1993). Intuitively, this criterion checks whether the 
process that describes the evolution of an agent eventually has "negative" drift and in this way controls for 
the growth of the agent's state. This same argument also allows us to bound the moments of the invariant 
distribution — precisely what is needed to find the desired set £ that is compact in the 1-p norm. 

One simple condition under which «£(/) is nonempty is that the state space is finite; any Markov chain 
on a finite state space possesses at least one positive recurrent class. In this case the entire set $ is compact 
in the 1-p norm. Thus we have the following result. 

Proposition 8. Suppose Assumption 1 holds, and that the state space X is finite. Then $(/) is nonempty 
for all f G 5, and $ is compact in the 1-p norm. 

We now turn our attention to the setting where the state space may be unbounded; for notational simplic- 
ity, in the remainder of the section we assume X = Zl. In this case, we must make additional assumptions 
to control for the agent's growth; these assumptions ensure the optimal strategy does not allow the state to 
become transient, and also allows us to bound moments of the invariant distribution of any optimal oblivious 
strategy. 

In the sequel we restrict attention to multiplicatively separable transition kernels, as defined below. 

Definition 10. The transition kernel is multiplicatively separable if there exist transition kernels Pi, ... , P<j 
such that for all x, x' G X, a G A, f G 3> there holds P(x'\x, a, f) = Yle=i P^O^I^i a > /)• l n this case we 
let Qi, . . . , be the coordinatewise increment transition kernels; i.e., Qi(zi\x, a, /) = P^ (x£-\-Zi\x, a, /), 
for z such that x + z G X. 

This is a natural class of dynamics in models with multidimensional state spaces. We note that if X is 
one-dimensional, the definition is vacuous. We introduce the following assumption. 

Assumption 4. 1. For all A G Z^, there holds limsup|| x ||^^^ sup ag _4j e j 7r(x+ A, a, /)— 7r(x, a, /) < 
0. 

2. The transition kernel P is multiplicatively separable. 

3. For I = 1, . . . , d, Pf(-|x, a, /) is stochastically nondecreasing in x G X and a G A for fixed f G 

4. For I = 1, . . . , d, and for each a £ A and / £ 5, Qe('\ x , a , f) is stochastically nonincreasing in 
x G X . Further, for all x G X, supj ^2 Z( zeQi(z£\x, a, f) is continuous in a. 

5. There exists a compact set A' C A, a constant K', and a continuous, strictly increasing function 
K : R + —¥ R+ with n(0) = 0, such that: 
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(a) For all x € X, f € 3> a A', there exists a' € A' with a' < a, such that 7r(x,a',/) — 
ir(x,a,f) > K(\\a' - a\\oo)- 

(b) For all £, and all x' such that x' e > K', sup a / g _4/ sup^y £V z iQi(zi\x' , a' , /) < 0. 

Some of the previous conditions are natural, while others impose a type of "decreasing returns to higher 
states." First, we discuss the former. Multiplicative separability (Assumption 4.2) is natural. The first part of 
Assumption 4.3 is also fairly weak. The transition kernel is stochastically nondecreasing in state in models 
for which the state is persistent, in the sense that a larger state today increases the chances of being at a 
larger state tomorrow. The transition kernel is stochastically nondecreasing in action in models where larger 
actions take agents to larger states. 

Assumption 4.1, 4.4, and 4.5 impose a form of "decreasing returns to higher states" in the model. In 
particular, Assumption 4. 1 ensures the marginal gain in payoff by increasing one's state becomes nonpositive 
as the state grows large. This assumption is used to show that for large enough states agents effectively 
become myopic; increasing the state further does not provide additional gains. Assumption 4.5 then implies 
that as the state grows large, optimal actions produce negative drift inducing a "light-tail" on any invariant 
distribution of the resulting optimal oblivious strategy. The set A' can be understood as (essentially) the set 
of actions that maximize the single period payoff function. Assumption 4.5 is often natural because in many 
models of interest increasing the state beyond a certain point is costly and requires dynamic incentives; 
agents will take larger actions that induce positive drift only if they consider the future benefits of doing so. 

The first part of Assumption 4.4 imposes a form of decreasing returns in the transition kernel. The 
second part of Assumption 4.4 will hold if, for example, the transition kernel is coordinatewise stochastically 
nonincreasing in / £ 5 (with respect to the first order stochastic dominance ordering) and Assumption 1 
holds. In this case sup^JZ zl z tQe.{ z t\ x i a i f) = J2 Ze z eQe( z i\ x > a i /)> where / is the distribution that 
places all its mass at state 0. 

Much of the difficulty in the proof of the result lies in ensuring that the tail of any invariant distri- 
bution obtained from an optimal oblivious strategy is uniformly light over the image of <£. The fact that 
Assumptions 4.1, 4.4, and 4.5 are uniform over / are crucial for this purpose. 

Under the preceding assumptions we have the following result. 

Proposition 9. Suppose X = WA, and Assumptions 1 and 4 hold. Then $(/) is nonempty for all / £ 5. 
and there exists a compact, convex, nonempty set £ such that C £ 

Note that the preceding result ensures <&(/) C £ for all f € 

We conclude this section with a brief comment regarding finite action spaces, cf. Definition 9. The key 
observation we make is that if Assumption 4 holds with respect to the pure actions — i.e., with A replaced by 
S — then the same result as Proposition 9 holds for mixed actions. A nearly identical argument applies to 
establish the result. 

5.4 Summary of Results 

The previous results can be summarized by the following corollary that imposes conditions over model 
primitives to guarantee the existence of a light-tailed SE. 
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Corollary 1. Suppose that (1) Assumption 1 holds; (2) either the game has a finite action space, or Assump- 
tion 2 holds, or Assumption 3 holds; and (3) either the state space X is finite, or X = and Assumption 
4 holds. Then, there exists a SE, and every SE (//, /) has f G 5p. 

As we have discussed and as one can show in the examples in Section 4, many models of interest satisfy 
Assumption 1 and Assumptions 2 or 3 (or, more generally, some condition that guarantees uniqueness of 
the optimal oblivious strategy); see Section 7. Hence, if these models have a finite state space, existence 
of SE follows immediately. If the state space is unbounded, the only condition that remains to be checked 
to guarantee existence of SE is Assumption 4. As discussed in the examples in Section 4, this condition 
imposes a form of "decreasing returns to higher states." 

We conclude by emphasizing that under the assumptions of the existence result all SE have / G $ p ; in 
other words, all the resulting SE have a light-tail. In the context of our examples, as previously discussed, 
this implies that all SE yield a fragmented market structure. In addition, the light-tail property, together with 
Assumption 1, will be used in the next section to ensure that the AME property holds. 



6 Theory: Approximation 

In this section we show that under the assumptions of the preceding section, any SE (/i, /) possesses the 
AME property. We emphasize that the AME property is essentially a continuity property in the population 
state /. Under reasonable assumptions, we show that the time t population state in the system with m 
players, f_™ t , approaches the deterministic population state / in an appropriate sense almost surely for all t 
as m — > oo; in particular, this type of uniform law of large numbers will hold as long as / has tails that are 
sufficiently light. If approaches / almost surely, then informally, if the payoff satisfies an appropriate 
continuity property in /, we should expect the AME property to hold. The remainder of the section is 
devoted to formalizing this argument. 

Theorem 2 (AME). Suppose Assumption 1 holds. Let (/i, /) be a stationary equilibrium with f G $ p . Then 
the AME property holds for (p, /). 

Observe that Assumption 1 is also required for the existence of SE that satisfy / G $ p . m this sense, 
under our assumptions, the AME property is a direct consequence of existence. This relationship between 
existence and the AME property is a significant insight of our work. 

The proof of the AME property exploits the fact that the 1-p-norm of / must be finite (since / G $ p ) 



to show that 



— > almost surely as m — > oo; i.e., the population state of other players 

l-p 

approaches / almost surely under an appropriate norm. Continuity of the payoff it in /, together with the 
growth rate bounds in Assumption 1, yields the desired result. 

In practice, the light-tail condition — i.e., the requirement that / G $ p — ensures that an agent's state 
rarely becomes too large under the invariant distribution / associated with the dynamics (6). Weintraub 
et al. (2011) provide a similar result in a dynamic industry model with entry and exit. Our result, on the 
other hand, is more general in terms of the definition of the AME property, as well as the payoff functions 
and transition kernels considered. In particular, we allow for dependence of the transition kernel on the 
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population state. This necessitates a significantly different proof technique, since agents' states do not 
evolve independently in general. We note that the light-tail condition is consequential, as it is possible to 
construct examples for which stationary equilibria exist, but / ^ $ p and the AME property does not hold 
(Weintraub et al. 2011). 

We conclude by noting that in many models of interest it is more reasonable to assume that the payoff 
function explicitly depends on the number of agents. To study these environments, we consider a sequence 
of payoff functions indexed by the number of agents, 7r m (x, a, /). Here, the profit function it is a limit: 
linim^oo 7r m (x, a, f) = 7r(x, a, /). (See Section 4 for concrete examples.) In this case, if the number of 
players is m, the actual expected net present value is defined with 7r m ; hence, the payoff function in the AME 
property depends on m. In Appendix B we show that under a strengthening of Assumption 1, Theorem 2 
can be generalized to this setting. 

7 Examples Revisited 

In this section we revisit each of the examples presented in Section 4 and show that all the propositions for 
these examples are consequences of Corollary 1 and Theorem 2. This establishes the key connection in the 
paper between existence of SE and the AME property on one hand, and the impact of model primitives on 
market structure on the other hand. In particular, our conditions over model primitives imply that all SE 
are light-tailed, and therefore, in all SE the industry yields a fragmented market structure, and the AME 
property is satisfied. 

Formally, recall that the conditions required to establish the main results of this paper are Assumption 1 
(used to ensure continuity properties); Assumption 2 and/or 3 (used to ensure convexity of the image of 
$); and Assumption 4 (used to ensure the existence of a compact subset C C 5 such that C £). Of 
these properties, continuity and convexity are typically straightforward to guarantee in each of the models 
we consider below. Thus we primarily focus on the role of Assumption 4. 

7.1 Dynamic Oligopoly Models 

In this section, we provide the proof of Proposition 1 . Note that the payoff function depends on the dis- 
tribution / via its 0\ moment, and hence we endow the set of distributions with the topology induced by 
the 1-p norm with p = 6\. Since the payoff is continuous and nonincreasing in the 9\ moment of /, and 
the transition kernel is independent of /, it is straightforward to check that Assumption 1 holds. In addition, 
Doraszelski and Satterthwaite (2010) show that the transition kernel of this model satisfies Assumption 2 (it 
can also be shown that Assumption 3 is satisfied). 

Thus the desired result is reduced determining whether Assumption 4 holds. It is straightforward to 
check that Assumptions 4.2-4 hold; we omit the details. Assumption 4.5 holds because positive drift is 
costly, as the kernel defined above exhibits depreciation; in particular, it suffices to set A! = {0}. Thus the 
central condition to check in this model is Assumption 4.1. This assumption holds if and only if9± < 1: in 
this case, sup a * tt(x + A, a, /) — ir(x, a, f) — > as x — > oo for all A > 0. Using Corollary 1 and Theorem 
2, the result follows. 
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Thus, existence of SE and the AME property are closely tied to the parameter 9 which represents how 
much consumers value the quality of the product. For 9 < 1, the firms have decreasing marginal returns in 
their payoff from increasing their state. This ensures that the Markov process associated with a single firm 
optimization process is stable which in turn ensures that the range of <& is compact. As discussed earlier, 
this condition leads to a natural separation between industries where we expect to see a fragmented market 
structure and the industries where market concentration is likely to result in the limit. 

7.2 Dynamic Oligopoly Models with Positive Spillovers 

In this section we provide the proof of Proposition 2. Assumption 1 and 2 follow as in the preceding result; 
the proof is omitted. Again we focus on Assumption 4. Assumption 4.1, 4.2, 4.3, and the first part of 4.4 
hold as before; we omit the details. The key assumptions that we need to verify are thus the second part of 
Assumption 4.4, and Assumption 4.5. 

Observe that the maximum possible value of the effective investment when a firm takes action a is 
e m ax(o) = a + 7sup y ({y). A straightforward calculation yields: 

sup £ Z Q(z\x, a, /) = (!- 5) f ^^M \ _ S ( 1 \ 
f V 1 + "e max (aj J \l + ae max (a) J 

= t- r-r - o. (12) 

1 + ae max (a) 

It follows from the definition of the transition kernel that the second part of Assumption 4.4 holds. In order 
for Assumption 4.5 to hold with A' = {0}, it follows that we need: 

5 

7 (1 - 5)asup y ((y) 

Using Corollary 1 and Theorem 2, we conclude that the result of the proposition follows if (10) holds. 

For industries with spillovers, the compactness assumptions requires that the spillover effect is not too 
large relative to depreciation. This along with decreasing marginal returns in the payoff ensures that the 
firms do not have unbounded growth in their state. As a result, the market structure becomes fragmented in 
the limit of a large number of firms. 

7.3 Learning-By-Doing 

In this section, we provide the proof of Proposition 3. Since P is decreasing and C(x, s) is decreasing in x, 
Assumption 1 follows in a straightforward manner in this model, as long as P is continuous. Since this 
is a model with finite action spaces, the result of Proposition 5 also applies. Thus, as before, the proof is 
reduced to determining whether Assumption 4 holds for the given model. As in the preceding examples, it 
is straightforward to check that Assumptions 4.2, 4.3, and 4.4 hold. Note that 7r(x + A, s, f) — n(x, s, f) = 
C(x, s) — C(x + A, s) for all x, s, f and A > 0, and the action space is finite. Thus Assumption 4.1 follows 
since C(x, s) is nonnegative, decreasing, and convex in x. 
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Therefore, our focus turns to Assumption 4.5. Using standard supermodularity arguments, it is simple 
to check that any s that maximizes n(x, s, /) for some x, / is contained in the interval [0, s*]. In particular, 
then, suppose that for all sufficiently large x and all actions s S [0, s*], we have J2 Z zQ(z\x, s) < 0. Then 
Assumption 4.5 holds, so using Corollary 1 and Theorem 2 the result follows. 

In learning-by-doing models, compactness of the image of is ensured by requiring that the transition 
kernel exhibits decreasing returns to higher states. In other words, if the productivity gains induced by 
learning-by-doing are reduced at larger states, light-tailed SE will exist and the AME property will hold. 
As discussed earlier this is consistent with the observation that a very strong learning-by-doing effect (that 
persists even at large scale) will likely lead to market concentration. 

8 Conclusions 

This paper considered stationary equilibrium in dynamic games with many players. Our main results provide 
a parsimonious set of assumptions on the model primitives which ensure that a stationary equilibrium exists 
in a large variety of games. We also showed that the same set of assumptions ensure that SE yield fragmented 
market structures, and is a good approximation to MPE in large finite games. Through a set of examples, 
we illustrate that our conditions on model primitives can be naturally interpreted as enforcing "decreasing 
returns to higher states." 

We conclude by noting several extensions that can be developed for the models described here. 

1 . Entry and exit. A natural extension, particularly relevant for dynamic oligopoly models, would be to 
consider a scenario where agents (i.e., firms) make entry and exit decisions endogenously in equilib- 
rium. We conjecture that under some mild additional assumptions our results would extend to this 
setting. 

2. Connections between SE and oblivious equilibrium infinite models. In some contexts, particularly in 
empirical settings, it may be more appropriate to work over a model with a finite number of agents. 
In these cases, as discussed in Section B, it is possible to define an "oblivious equilibrium" for finite 
models (Weintraub et al. 2008). We conjecture that under some additional technical conditions over 
the model primitives we can prove that a sequence of OE satisfies the AME property. 

3. Nonstationary equilibrium. Our focus was on SE because it is of practical interest and has received 
significant attention in the literature. We conjecture, however, that our results can be extended to 
nonstationary versions of an equilibrium concept based on averaging effects that could be used to 
approximate transitional short-run dynamics as oppose to long-run behavior. 

We leave these directions for future research. 
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A Extensions to the Basic Model 
A.l Heterogeneous Players 

In this section, we study anonymous stochastic games with ex-ante heterogeneous players. To represent 
this heterogeneity, at the beginning of the game, a player is assigned a type (denoted by 9) that stays fixed 
for the entire duration of the game. For simplicity, we assume that the players' types are randomly and 
independently drawn out of a finite set with a common prior distribution T. Let P(-\x,a, f;9) and 
ir(x, a, f; 9) denote the transition kernel and payoff of a type 9 player. 

To analyze a stochastic game with heterogeneous players, we define a new state as follows. Let x = 
(x, 9) be an extended state; if a player's extended state is x, we interpret it to mean that the player is in state 
x and has a type 9. We let X = X x denote the expanded state space. Let / denote a population state 
over the expanded state space, i.e., / is a distribution over X. Given /, we define F(f) € 5 by: 

f(/Kx) = ^/M). 

6 

We have the following two definitions: 

Tr(x,a,F(f);9); 

f 0, i£9'^9; 
{ P(x'\x,a,F(f);9), if 9' = 9. 

These definitions simply map the payoff and transition kernel with respect to the extended state back to 
the payoff and transition kernel in the original game. Now observe that in the new game defined in this 
way, it can be verified that if the original game satisfied Assumptions 1, 2 or 3, and 4 for each 9, then the 
extended game satisfies the same assumptions as well. Thus all our preceding results apply even in games 
with heterogeneous players. Because strategies are a function of the extended state, in this case players of 
different types will use different strategies. 

A.2 Coupling Through Actions 

In the main development of this paper, we considered anonymous stochastic games where players are cou- 
pled to each other via the population state as defined in equation (2); note, in particular, that the population 
state gives the fraction of players at each state. As discussed in the Introduction, however, in many models of 
interest the transition kernel and payoff of a player may depend on both the current state and current actions 



Tt{x,a,f) = 
P(x'\x,a,f) = 
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of other players. In particular, the example in Section 4.3 is a model where players are coupled through their 
actions. 

To formally model such a scenario, we consider an m player stochastic game being played in discrete 
time over the infinite horizon, where again the payoff and transition kernel of a player are denoted by 
7r(x, a, f) and P(-|x, a, f) respectively. 8 However, we now assume that / is a distribution over both states 
and actions. We refer to / as the population state-action profile (to distinguish it from just the population 
state, which is the marginal distribution of / over X). For simplicity, since the prior development in this 
paper assumes state spaces are discrete, for the purposes of this subsection we restrict attention to a game 
with a finite action space ScZ 4 , cf. Definition 9; in particular, we assume that players maximize payoffs 
with respect to randomized strategies over S. Thus the population state-action profile is a distribution over 
X x S. 

We again let x^t 6 X be the state of player i at time t, where X C Z d . We let Si.t € S be the (pure) 
action taken by player i at time t. Let f ™\ denote the empirical population state-action profile at time 
t in an m-player game; in other words, f>^(x,s) is the fraction of players other than i at state x who 
play s at time t. With these definitions, x^t evolves according to the transition kernel P as before, i.e., 
x ijt +i ~ P(-|xi, t ,a i)t ,/i™J). 

A player acts to maximize their expected discounted payoff, as before. Note that a potential challenge 
here is that a player's time t payoff and transition kernel depend on the actions of his competitors, which 
are chosen simultaneously with his own action. Thus to evaluate the time t expected payoffs and transi- 
tion kernel, a player must take an expectation with respect to the randomized strategies employed by his 
competitors. 

Our first step is to extend the appropriate assumptions to this game model. Let $ now denote the set 
of all distributions over X x S, and let $ p denote the set of all distributions in £ with finite 1-p-norm as 
before. Assumptions 1 and 4 thus extend naturally to games with coupling through actions, with these new 
interpretations of 5 and $ p . 

The AME property continues to hold for games with coupling through actions. Recall that in the proof 
of Theorem 2, we establish that if (//, f) is a stationary equilibrium, then — f\\\- P — > almost surely 

for all t, if players' initial states are sampled independently from /, all players other than i follow strategy 
fi, and player i follows any strategy. (See Lemma 10 in the Appendix.) In a game with coupling through 
actions, f_™\ also tracks the empirical distribution of players' actions. However, since all players other 
than i use the same oblivious strategy p,, and since the base action space S is finite, it is straightforward to 
extend the argument of Lemma 10 to the current setting. The remainder of the existing proof of Theorem 2 
carries over essentially unchanged under Assumption 1; for brevity we omit the details. 

Next, recall that to prove existence of a stationary equilibrium, we consider two maps: V(f) (which 
identifies the set of optimal oblivious strategies given /), and T>(p, f) (which identifies the set of invariant 
distributions of the Markov process induced by p, and /). The analysis of V(f) proceeds exactly as before 
(but with randomized strategies, as discussed in Section 5.2.1). However, in a game with coupling through 
actions, we redefine T>(p, /) to be the set of invariant distributions over X x S induced by fi and /. In other 

8 For the purposes of this subsection we assume players are homogeneous. 
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words, /' € T>(fi, f ) if and only if there exists a distribution g over X such that the following two conditions 
hold: 

9( x ') = ^0(aOP(a/|x,/i(a;),/), for all x'; 

X 

f'(x, s) = g(x) ■ fj,(x){s), for all x, s. 

Note that here p,(x)(s) is the probability assigned to pure action s by the randomized strategy p at state x. 
The first equation requires that g is an invariant distribution of the state Markov process induced by p and / 
(recall Definition 9 of the transition kernel with mixed actions). The second equation requires /' to be 
derived from g in the natural way, via p. As before, we let <£(/) = V(V(f), /). 

It is now straightforward to show that if Assumption 1 holds, then the result of Proposition 4 holds, i.e., <& 
has a closed graph. Further, if Assumptions 1 and 4 hold, then the result of Proposition 9 holds as well. From 
this and the result in Proposition 5 we conclude that under those assumptions, a stationary equilibrium exists, 
and all SE are light-tailed (i.e., have finite 1-p norm). The arguments involved are analogous to the existing 
proofs, and we omit the details. 

We conclude by commenting on the restriction that the action space must be finite. From a computational 
standpoint this is not very restrictive, since in many applications discretization is required or can be used 
efficiently. From a theoretical standpoint, we can analyze games with general compact Euclidean action 
spaces using techniques similar to this paper, at the expense of additional measure-theoretic complexity, 
since now the population state-action profile is a measure over a continuous extended state space. 



B Approximation: Sequence of Payoff Functions 

In many models of interest it is more reasonable to assume that the payoff function explicitly depends on the 
number of agents. To study these environments, in this section we consider a sequence of payoff functions 
indexed by the number of agents, ir m (x, a, /). Here, the profit function ir is a limit: linim^oc ir m (x, a, f) = 
7r(x, a, /). See Section 4 for concrete examples. 

In this case, the actual expected net present value of a player using a cognizant strategy p' when every 
other of the m — 1 players uses an oblivious strategy p is given by equation (5), but where ir is replaced by 
7r m . That is, if the number of players is m, the payoff obtained each period is given by n m . Hence, with 
some abuse of notation, for this section, we define: 



(m-l) 



^(^/l/*> M) ) = 

oo 

E[^]^7r m (a:j, t ,ai it ,/^) | x ifl = x,f^ Q = /;//» = = A 1 

t=o 

We generalize Theorem 2 for this setting. First, we need to strengthen Assumption 1 . 
Assumption 5. For each m € Z+, Assumption 1 holds, with the following strengthened properties. 



■ (13) 
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1. Equicontinuity. The set of functions {Tr m (x, a, /) : m 6 Z + } is jointly equicontinuous 9 in a £ A 
and f G 3p. 

2. Uniform growth rate bound. 77jere exist constants K and n € Z + swc/z f/ja? sup meZ+ t aeA,f£$ P I 71 "™ a ' /) I — 
K(l + 1 1 x 1 1 00 ) n for every x £ X. 

The following result is more general than Theorem 2, because the payoff function in the AME property 
depends on m. 

Theorem 3 (AME). Suppose Assumption 5 holds. Let (/x, /) be a stationary equilibrium with f € $ p . Then 
the AME property holds for (/i, /). 

The proof is similar to Theorem 2, but requires an additional step to accommodate the sequence of 
payoff functions. However, note that similar to Theorem 2, the stationary equilibrium (/i, /) is fixed and is 
computed with the limit payoff function ir. Alternatively, it is possible to define an "oblivious equilibrium" 
(OE) for each finite model. An OE is similar to SE in the sense that agents optimize assuming that the long 
run population state is constant; the main difference is that it is defined in a finite model rather than in the 
limit model. Under a uniform light-tail condition, it can be shown that the sequence of OE satisfies the AME 
property Weintraub et al. (2008). In addition, we conjecture that a version of the assumptions that guarantee 
existence of SE in Section 5, but that applies uniformly over all finite models, would guarantee that such 
a uniform light-tail condition holds. For clarity of presentation, we chose to work with the SE of the limit 
model directly. 

Moreover, we believe that the existence result for the limit model that we provide is important, because 
even though OE might exist under mild conditions for each finite model, SE in the limit model may fail to 
exist. In particular, as we discuss in Section 4, this might be the case in applications that exhibit "increasing 
returns to scale". See in particular Sections 4.1, 4.2, and 4.3, for examples of how limit models are derived 
in specific applications, and also conditions in such models that ensure stationary equilibria provide accurate 
approximations. 

C Additional Examples 

In this section we present two additional applications to our results, to a model of supply chain competition, 
and a model of consumer learning. 

C.l Supply Chain Competition 

We now consider an example of supply chain competition among firms (Cachon and Lariviere 1999), where 
the firms use a common resource that is sold by a single supplier. The firms only interact with each other in 
the sourcing stage as the goods produced are assumed to be sold in independent markets. 

9 Let X and y be two metric spaces, with metrics dx and dy respectively. A set of functions T mapping X to y is said to be 
equicontinuous at xq £ X, if for every e > 0, there exists a 8 > such that dy(f(x), / (xo)) < e f° r all / G T and all x such that 

dx(xo, x) < S. 
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States. We let the state Xi t be the inventory of goods held by firm i at time t. 

Actions. At each time period, the supplier runs an auction to sell the goods. Each firm i places a bid a^t 
at time t; for example, a^t may denote the willingness-to-pay of the supplier, or it may be a two-dimensional 
bid consisting of desired payment and quantity. Since the interaction between firms is via their action profiles 
we again assume that the action taken by a firm lies in a finite subset S of the integer lattice. 

Transition dynamics. Suppose that each firm i sees demand dij at time t; we assume di t t are i.i.d. and 
independent across firms, with bounded nonnegative support and positive expected value. Further, suppose 
that when a firm bids s and the population state-action profile is /, the firm receives an allocation £(s, /). 
Then the state evolution for a firm i is given by Xi t+i = maxjx^t — di t t, 0} + C( s i,t, f^-i\)- Note that 
£ depends on f^™\ only through the marginal distribution over actions. We make the natural assumptions 
that £(s, /) is increasing in s and decreasing in / (where the set of distributions is ordered in the first 
order stochastic dominance sense). Thus the transition kernel captures inventory evolution in the usual way: 
demand consumes inventory, and procurement restocks inventory. The amount of resource procured by a 
firm and the price it pays depends on its own bid, as well as bids of other firms competing for the resource. 

As one example of how £ might arise, suppose that the supplier uses a proportional allocation mecha- 
nism (Kelly 1997). In such a mechanism, the bid s denotes the total amount a firm pays. Further, suppose 
the total quantity Q m of the resource available scales with the number of firms, i.e., Q m = mQ. Let 
M s l/) = f( x i s ) denote the fraction of agents bidding s in population state-action profile /. 

As m — > oo, and introducing R as a small "reserve" bid that ensures the denominator is always nonzero, 
we obtain the following limiting proportional allocation function: £(s, /) = sQ/ (^R+Y2 S ' s '^( s 'l/) ) ■ Note 
that this expression is increasing in s and decreasing in /. 

Payoffs. A firm earns revenue for demand served, and incurs a cost both for holding inventory, as well 
as for procuring additional goods via restocking. We assume every firm faces an exogenous retail price <j>. 
(Heterogeneity in the retail price could be captured via the description in Section 3.5.) Let h be the unit 
cost of holding inventory for one period and let Q(s, f) be the procurement payment made by a firm with 
bid s, when the population state-action profile is /; of course, also depends on / only through fc(-|/). 
In general we assume that Q is increasing in / for each fixed s. In the proportional allocation mechanism 
described above, we simply have 0(s, /) = s. Since the demand is i.i.d., the single period payoff for a 
firm is given by the expected payoff it receives, where the expectation is over the demand uncertainty; i.e. 
ir(x, s, f) = (jM[min{d, x}] — hx — /). 

Discussion. We have the following proposition. 

Proposition 10. Suppose that d has positive expected value. Then there exists an SE for the supply chain 
competition model with the proportional allocation mechanism, and all SE possess the AME property. 

Proof. We present the proof in a more general setting, and specialize to the proportional allocation mech- 
anism. If £ and Q are uniformly bounded and appropriately continuous in / for each pure action s, then 
Assumption 1 follows in a straightforward manner. For example, in the proportional allocation mechanism 
with a positive reserve bid R, note that £ is continuous in / in the 1-p norm with p = 1, since £ depends 
on / through its first moment. Since this is a model with finite action spaces, the result of Proposition 5 
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also applies. Thus, as before, the proof is reduced to determining whether Assumption 4 holds for the given 
model. 

As before, Assumption 4.2, Assumption 4.3, and Assumption 4.4 are easy to check. Assumption 4.1 
follows because the payoff function is decreasing in x for large x. Finally, suppose € S and £(0, /) = 
for all /; this will be the case, for example, in the proportional allocation mechanism with reserve R. Then 
if A' = {0}, it follows that Assumption 4.5 holds, as long as (1) d has positive expected value; and (2) 
bidding zero is myopically optimal, and this induces negative drift in the inventory level. Note that bidding 
zero is myopically optimal for the proportional allocation mechanism, and this induces negative drift in the 
inventory. Using Corollary 1 and Theorem 2 the result follows. ■ 

More generally, for other choices of allocation mechanism, it can be shown that the same result holds if 
d has positive expected value and the following conditions hold: (1) if £ and Q are uniformly bounded and 
appropriately continuous in / for each pure action s; (2) € 5 and £(0, /) = for all /; and (3) bidding 
zero maximizes a firm's single period payoff, and this induces negative drift in the inventory level. 

In this model, decreasing returns to higher states are naturally enforced because the payoff function 
becomes decreasing in the state as the state grows. Simply because holding inventory is costly, firms prefer 
not to become arbitrarily large. Thus in this model light tails in the population state can be guaranteed under 
fairly weak assumptions on the model primitives. 

C.2 Consumer Learning 

In this section, we analyze a model of social learning. Imagine a scenario where a group of individuals decide 
to consume a product (e.g., visiting a restaurant). These individuals learn from each other's experience, 
perhaps through product reviews or word-of-mouth (see, for example, Ching 2010). 
States. We let x^t be the experience level of an individual at time t. 

Actions. At each time period t, an individual invests an "effort" a^t € [0,a] in searching for a new 
product. 

Payoffs. At each time period, an individual selects a product to consume. The quality of the product is a 
normally distributed random variable Q with a distribution given by Q ~ M (7a, uj{x, /)), where 7 > is a 
constant. Thus, the average quality of the product is proportional to the amount of effort made. Furthermore, 
the variance of the product is dependent on both individual and population experience levels. 

We assume that lo(x, f) is continuous in the population state / (in an appropriate norm, cf. Section 
5). We make the natural assumption that ui(x, f) is a nonincreasing function of / and strictly decreasing 
in x (where the set of distributions is ordered in the first order stochastic dominance sense). This is natural 
as we expect that as an individual's experience increases or if she can learn from highly expert people, the 
randomness in choosing a product will decrease. We also assume that there exists constants <tl, an, such 
that a\ < u(x, f) < a 2 H . 

The individual receives a utility U(Q), where U(-) is a nondecreasing concave function of the quality. 
For concreteness, we let U(Q) = l—e~®. Since at each time, the individual selects the product or the restau- 
rant in an i.i.d. manner, the single period payoff is given by ir(x, a, f) = E [U(Q) | Q ~ M (7a, co(x, /))] — 
da = 1 — e ~T a +2^0>/') _ d a ^ where d is the marginal cost of effort. 



35 



Transition dynamics. An individual's experience level is improved as she expends effort because she 
learns more about the quality of products. However, this experience level also depreciates over time; this 
depreciation is assumed to be player-specific and comes about because an individual's tastes may change 
over time. Thus, an individual's experience evolves (independently of the experience of others or their in- 
vestments) in a stochastic manner. Several specifications for the transition kernel satisfying our assumptions 
can be used; for concreteness we assume that the dynamics are the same as those described in Section 4.1. 

Discussion. Our main result is the following proposition. 

Proposition 11. Suppose that: 

d > ie -~ 1C0+ ^« , (14) 

where cq = 5/{a{l — 5)). Then there exists an SEfor the consumer learning model, and all SE possess the 
AME property. 

Proof. Note that uj(x, f) < o\ and thus the growth rate bound in Assumption 1 is trivially satisfied. If 
lo(x, f) is continuous in / (in the appropriate l-p norm), then Assumption 1 follows in a straightforward 
manner. To verify that $»(/) is convex, we note that Assumption 3 will hold if tv(x, a, f) is strictly increasing 
in x and concave in a. Since lo(x, /) is strictly decreasing in x, these conditions are naturally satisfied for 
our model. Thus, to complete the proof, we need to verify Assumption 4. 

It is straightforward to check that Assumption 4.2-4 hold; we omit the details. Assumption 4. 1 follows 
since u(x, f) is nonincreasing in x and bounded below, so ui(x, f) — ui(x + A, /) — » as x — > oo. In order 
for Assumption 4.5 to hold, we require A' to contain all myopically optimal actions. A straightforward 
calculation shows that arg max n(x, a, f) = a*(x, /), where 

a *( x i f) = ^~u(x, /) log 

for simplicity we assume < a*(x, /) < a for all x, f, though an analogous argument holds otherwise. 
Thus we define A 1 = [0, a max ], where: 

1 2 1, fd\ 
a max = -a H - -log 

To verify Assumption 4.5(a), note that if a $ A', then: 

tt(x, a*(x, /),/)- ir(x, a, f) = e -7a*(*J)+^(*,f)( e -y(a-a*(*,f)) + d{a _ a * {X} f)) 

= -n(a - a*(x,f)), 
7 

where k(x) = e~ lx — 1+jx, which is strictly increasing and nonnegative with k(0) = 0. Here the preceding 
derivation follows by observing that the optimality condition for a* (x, f) ensures that -ye~ ia * 2 W ( X '^) = 
d. Thus Assumption 4.5(a) holds. 

When do the actions in A' produce negative drift in the state? For the dynamics given in Section 4.1, one 
can easily verify that the drift is negative if the action is sufficiently small; in particular, the drift is negative 
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for any action a such that: 

„ & A 

where (5 € (0, 1) is the probability that the experience depreciates and a > controls the probability that a 
player is successful in improving the experience. The above inequality is satisfied by all a' £ A' if : 

d > 7 e~ 7C0+ 5' T H. 

Using Corollary 1 and Theorem 2 the result follows. ■ 

Recall that (5 G (0, 1) is the probability that the experience depreciates and a > controls the probability 
that a player is successful in improving the experience. The right hand side is an upper bound to the marginal 
gain in utility due to effort, at effort level Co; while the left hand side is the marginal cost of effort. Thus 
the condition (14) can be interpreted as a requirement that the marginal cost of effort should be sufficiently 
large relative to the marginal gain in utility due to effort. Otherwise, an individual's effort level when her 
experience is high will cause her state to continue to increase, so a light-tailed SE may not exist. Hence we 
see the same dichotomy as before: decreasing returns to higher states yield existence of SE and the AME 
property, while increasing returns may not. 



D Existence and AME: Preliminary Lemmas 

We begin with the following lemma, which follows from the growth rate bound and bounded increments in 
Assumption 1. 

Lemma 1. Suppose Assumption 1 holds. Let xq = x. Let at £ A be any sequence of (possibly history 
dependent) actions, and let ft £ $ be any sequence of (possibly history dependent) population states. 
Let xt be the state sequence generated, i.e., xt ~ P(- | xt-i, Gtt-i, ft—i)- Then for all T > 0, there exists 
C(x, T) < oo such thatK [Y^Lt l 71 "^*' a *> ft)\ \ x o = x ] ^ C(x, T). Further, C(x,T) — > as T — > oo. 



Proof. Observe that by Assumption 1, the increments are bounded. Thus starting from state x, we have 

\\ x t\ 



< IMloo + tM. Again by Assumption 1, \ir(xt, at, ft)\ < K(l + ||xt|| 00 ) n . Therefore: 



E 



lt=T 



t) \X = X 



\x\ 



tM) n . 



t=T 



We define C(x, 0) as the right hand side above when T = 0: 

oo 

C(x,0) = K^2(3 t (l + 

Observe that C(x, 0) < oo. 



x 



tM) n . 



t=o 
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We now reason as follows for T > 1: 



t=T 



X]T p\l + \\x\\ oa + tM) n = K(3 T ^2p t {l + \\x\\ ao + tM + TM) n 

t=o 

oo n / \ 
= K(3 T Y(3 t Yr)(l + \\x\\ OD + tMy(TMy 

oo n / \ 

< Kf & E (" K 1 + Halloo + tM) n (TM) 



t=0 3=0 



K(3 T 2 n (TM) n ]T /3*(1 + Hxll^ + tM) r 



t=o 



= C(x,0)/3 T (2MT) n . 

Here the inequality holds because 1 + ||x||oo + tM > 1, M > 0, and T > 1. So for T > 1, define: 

C(x,T) = C(x,0)/3 T (2MT) n . 
Then C(x, T) — > as T — > oo, as required. 



(15) 



We now show that the Bellman equation holds for the dynamic program solved by a single agent given 
a population state /. Given our unbounded state space, our proof involves the use of a weighted sup norm, 
defined as follows. For each x G X, let W(x) = (1 + ) n . For a function F : X — > M, define: 



\W-oo = SU P 



F(x) 



W(x) 



This is the weighted sup norm with weight function W. We let B(X) denote the set of all functions F : 

X R such that H^Hvy.oo < °°- 

Let Tf denote the dynamic programming operator with population state /: given a function F : X — > K, 
we have (TfF)(x) = sup ag ^ |vr(x, a, /)+/3 Ylx'^x F(x')Y > (x' \ x, a, /) j. We define T* to be the compo- 
sition of the mapping Ty with itself k times. The following lemma applies standard dynamic programming 
arguments. 

Lemma 2. Suppose Assumption 1 holds. For all f G if F G B(X) then TfF G B(X). Further, 
there exist k,p independent of f with < p < 1 such that Tf is a k-stage p-contraction on B(X); i.e., if 



F, F' G B(X), then for all f: 



TjF - TfF' 



W-oo 



< p\\F-F'\ 



W-oc ■ 



In particular, value iteration converges to V*(-\f) € B(X) from any initial value function in B(X), 
and for all f G $ and x G X, the Bellman equation holds: 



V*(x | /) = sup {vr(x, a, /) + V V*(x' | /)P(x' | x, a, /)) 



(16) 



Further, V*(x\f) is continuous in f G $ p . 
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Finally, there exists at least one optimal oblivious strategy among all (possibly history-dependent, pos- 
sibly randomized) strategies; i.e., V(f) is nonempty for all / G 5- An oblivious strategy p € Wlo is optimal 
given f if and only if p(x) achieves the maximum on the right hand side of (16) for every x £ X. 

Proof. We have the following three properties: 

1. By growth rate bound in Assumption 1 we have sup a \tv(x, a, f)\/W(x) < K for all x. 

2. We have: 

W{x) = sup V P(x' | x, a, f)W(x') < (1 + IMI^ + M) n , 
aeA x , 

since the increments are bounded (Assumption 1). Thus W(x)/W(x) < (1 + M) n for all x. 

3. Finally, fix p such that < p < 1 and let: 

W k (x) = sup B[W{x k )\x Q = x,p], 

where the state evolves according to x t +\ ~ P(- | xt, p(xt), /). By bounded increments in Assump- 
tion 1, we have: 

P k W k {x) < /3 fc (l + Hxll^ + kM) n < (3 k (l + kM) n W{x). 
By choosing k sufficiently large so that /3 fc (l + kM) n < p, we have: 

P k W k {x) < pW{x). 

Given (l)-(3), by standard arguments (see, e.g., Bertsekas 2007), it follows that Tf is a /c-stage p- 
contraction with respect to the weighted sup norm, value iteration converges to V*(- | /), the Bellman 
equation holds, and any (stationary, nonrandomized) oblivious strategy that maximizes the right hand side 
in (16) for each x € X is optimal. Observe that since V*(- | /) G B(X) for any /, it follows that 
V*(x j /) < oo for all x. In fact, by Lemma 1, \V*(x | /)| < C(x, 0) for all x. 

Next we show that V*(x | /) is continuous in /. Define Z(x) = for all x, and let = TjZ. We 
first show that (x) is continuous in /. To see this, we proceed by induction. The result is trivially true 
at £ = 0. Next, observe that n(x, a, f) is jointly continuous in a and / for each fixed x by Assumption 1. 
Suppose vj (x) is continuous in / for each x; then vj (x')~P(x' \ x,a,f) is jointly continuous in a 
and / for each fixed x, x' . Since the kernel has bounded increments from Assumption 1, we conclude that 
J2 X ' V^\x')P(x' I x, a, f) is jointly continuous in a and / for each fixed x. It follows by Berge's maximum 
theorem (Aliprantis and Border 2006) that vf +1 ^ (x) is continuous in /. 

Fix e > 0. Since Tf is a /c-stage p-contraction in the weighted sup norm for every /, it follows that for 
all sufficiently large £, for every / there holds: 

\vf\x) - V*(x | /)| < W(x)e. 
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So now suppose that f n — > f in the l-p norm. Since Vj{x) is continuous in /, for all sufficiently large n 
there holds: 

\vf]{x)-vf\x)\<e. 
Thus using the triangle inequality, for all sufficiently large n we have: 

\V*(x\f)-V*(x\f n )\ < (2W(x) + l)e. 

Since e was arbitrary it follows that the left hand side approaches zero as n — >■ oo, as required. Finally, 
observe that by a similar argument as above, 

J2v*(x'\f)P(x'\x,a,f) 

x> 

is a continuous function of a for each fixed x and /; since ir(x, a, /) is also continuous in a for each fixed 
/, the right hand side of (16) is continuous in a for each fixed /. Since A is compact, it follows that there 
exists an optimal action at each state x, and thus there exists an optimal strategy given /. ■ 

E Existence: Proof 
E.l Closed Graph: Proof 

Throughout this subsection we suppose Assumption 1 holds. 

Lemma 3. For each f, V(f) is compact; further, the correspondence V is upper hemicontinuous on $ p . 

Proof. By Assumption 1, tt(x, a, f) is jointly continuous in a and /. Lemma 2 establishes that the optimal 
oblivious value function V*(x \ f) is continuous in /, and so as in the proof of that lemma, it follows that for 
a fixed state x, ir(x, a, f)+/3 V*(x' | /)P(x' | x, a, f) is finite and jointly continuous in a and /. Define 
the setV x (f) C A as the set of actions that achieve the maximum on the right hand side of (16); this is 
nonempty as A is compact (Assumption 1) and the right hand side is continuous in a. By Berge's maximum 
theorem, for each x the correspondence V x is upper hemicontinuous with compact values (Aliprantis and 
Border 2006). 

By Lemma 2, fj, G V{f) if and only if ^l(x) G V x (f) for each x. Note that we have endowed the set 
of strategies with the topology of pointwise convergence. The range space of V is an infinite product of 
the compact action space A (Assumption 1) over the countable state space. Hence by Tychonoff 's theo- 
rem (Aliprantis and Border 2006), the range space of V is compact. Further, since V x is compact-valued, 
it follows that V is compact- valued. Since V x (f) is compact- valued and upper hemicontinuous, the Closed 
Graph Theorem ensures that V x has a closed graph (Aliprantis and Border 2006). This in turn ensures that 
V has closed graph; again by the Closed Graph Theorem, we conclude that V is upper hemicontinuous. ■ 

Proof of Proposition 4. Suppose /& — > f in the l-p norm, and that — > g in the l-p norm, where 
gk € 3>(/fc) for all k. We must show that g G $(/). For each k, let G V(fk) be an optimal oblivious 
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strategy such that g k G V(fi k , fk)- As in the proof of Lemma 3, the range space of V is compact in the 
topology of pointwise convergence; therefore, taking subsequences if necessary, we can assume without 
loss of generality that \i k converges to some strategy /i € 9Jto pointwise. By upper hemicontinuity of V 
(Lemma 3), we have {i e V(f). 

By definition of V, it follows that for all x: 

9k{x) = ^ g k {x')P(x\x' Hk{x'), f k ). (IV) 

x> 

Since P(x\x',a, f) is jointly continuous in action and population state (Assumption 1), it follows that for 
all x and x'\ 

P(x\x',n k (x'),f k ) -)• P(x\x',n(x'),f) 

as k — > oo. Further, if <?fe — ^ f7 in the 1-p norm, then in particular, g k (x) — > g{x) for all x. Finally, observe 
that for all a and /, we have P(x\x' , a, f) = for all states x' such that \\x' — x\\oo > M, since increments 
are bounded (Assumption 1). Thus: 

^2 g k (x')P(x\x',fi k (x'), f k ) ->• ^2 g(x')P(x\x', fi(x'), f) 

x' x' 

as k — > oo. Taking the limit as k — > oo on both sides of (17) yields: 

g(x)=J29^')P(x\x', f i(x'),f), (18) 

x' 

which establishes that g £ T>(fi, /). Since we had fj, E V(f), we conclude g € <£(/), as required. ■ 
E.2 Convexity: Proof 

Proof of Proposition 5. Fix / £ $ p , and let gi,g2 be elements of <&(/). Let fi\,fi2 S be strategies 
such that gi € T>(/j,i, /), i = 1,2. Then for i = 1, 2 and all x' € <%\ we have: 

9i{%') = ^2j9i{x')P{x' | x,fii(x),f). 

x 

Fix 5, < 5 < 1, and for each x, define g(x) by: 

g(x) = 5gi(x) + (1 - %2(z)- 

We must show g € $>(/). Define a new strategy as follows: for each x such that g(x) > 0, 

_ 5gi{x)iii(x) + (1 - 8)g 2 (x)n 2 (x) 

For each x such that = 0, let n{x) = fii(x). 
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We claim that p, G V(f), i.e., /x is an optimal oblivious strategy given /; and that g E T>(fi, f), i.e., that 
g is an invariant distribution given strategy p, and population state /. This suffices to establish that g G $(/). 

To establish the claim, first observe that under Definition 9, the right hand side of (16) is linear in a. 
Thus any convex combination of two optimal actions is also an optimal action. This establishes that for 
every x, p{x) achieves the maximum on the right hand side of (16); so we conclude p G V{f). 

Let T = {x : g(x) > 0}. Then: 

g(x') = 5 9l (x') + (1 - 5)g 2 (x>) 

= J2^9i(x)P(x' | x,p 1 (x),f) + (1 - 6)g 2 (x)P(x> \ x,p 2 (x)J) 

x 

= E E + (1 - S)g 2 (x)p 2 (x)(s)) P(x> | x, s, f) 

X s 

= ^jW/'WfslP^ I x,s,f). 

The first equality is the definition of g(x'), and the second equality follows by expanding the invariant 
distribution equations for g\ and g 2 . The third equality follows by expanding the sum over pure actions 
s. Finally, in the last equality, we substitute the definition of g{x), and we also observe that for x g" T, 
g(x) = — and therefore, g±(x) = g 2 (x) = 0. Since g(x) = for x £ T, it follows that: 

]T g(x)^(x)(s)P(x' | x, s, f) = 0. 

x£T s 

It follows that: 

s<y) = x^o^pfy i x,n(x)j), 

X 

as required. ■ 

Lemma 4. Suppose Assumptions 1 and 3 hold. Then V*(- | /) is strictly increasing for every f G 3p. and 
the right hand side of (16) strictly concave in a. 

Proof. Define Z(x) = for all x, and let = T e jZ. Observe that if is nondecreasing, then under 
the conditions of the lemma, it follows that V^ +l ^ will be nondecreasing. Taking the limit as n — > oo, we 
conclude (from convergence of value iteration) that V*{- \ f) is nondecreasing, and thus the right hand side 
of (16) is strictly increasing in x. From this it follows that in fact, V*(- \ f) is strictly increasing. 

Since V*(- \ f) is strictly increasing, tt(x, a, /) is concave in a, and the kernel is stochastically concave 
in a, with at least one of the last two strictly concave, it follows that the right hand side of (16) is strictly 
concave in a. m 

Proof of Proposition 7. Under Assumptions 1 and 2, the optimal action in (16) can be shown to be unique 
(see Doraszelski and Satterthwaite 2010). It follows that V(f) is a singleton. 

From the preceding lemma, Assumptions 1 and 3 together also guarantee a unique optimal solution in 
the right hand side of (16), for every iGl Thus under either of these conditions the optimal strategy given 
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/ is unique, i.e., V(f) is a singleton. The result follows by Proposition 6. 



E.3 Compactness: Proof 

Throughout this subsection we suppose X = and that Assumptions 1 and 4 are in effect. 

Lemma 5. Given x' > x, x, x' £ X, a £ A, and / 6 5, there exists a probability space with random 
variables £' ~ Q(- | x' , a, /), £ ~ Q(- | x, a, /), smc/i ?/zctf £' < £ almost surely, and x' + £' > x + £ almost 
surely. 

Proof. The proof uses a coupling argument. Let C/ be a uniform random variable on [0, 1]. Let Fg (resp., F'g) 
be the cumulative distribution function of Q^(- | x, a, f) (resp., Q^(- | x', a, /)), and let Gg (resp., G' e ) be the 
cumulative distribution function of P^(- j x, a, f) (resp., P^(- | x', a, /)). By Assumption 4, P^(- | x, a, f) 
is stochastically nondecreasing in x, and Q^(- | x, a, f) is stochastically nonincreasing in x. Thus for all z, 
F t (z) < Fftz), but for all y, G e (y) > G' t {y); further, G e (y) = F e (y - x e ) (and G' t {y) = Ffy - x' £ )). Let 
& = M{zg : F e (zf>) > U}, and let $ = iid{z e : F'(zg) > U}. Then & > ^ for all I, i.e., £ > Rewriting 
the definitions, we also have xg+& = mf{ye : F^yg—xi) > U}, andx^+^ = mf{y£ : F^(ye — x' e ) > U}, 
i.e., x t + ^ = mf{y e : G e (y e ) >U}, and x' e + C e = mf{y e : G' e (y e ) >U}. Thus xg + & < x' i + ^ for all 
£, i.e., cc' + £' > x + £, as required. ■ 

Given a set S 1 define Poo(x, S) = inf^ e s ||x — y\\oo- Thus p^ gives the oo-norm distance to a set. We 
have the following lemma. 

Lemma 6. As \\x\\oo — > oo, supj e5 sup^g-p^) p OQ (fi(x), A') — > 0. 

Proof. Suppose the statement of the lemma fails; then there exists r > and a sequence / n E p n € 
V(f n ), and x n (where ||x n ||oo — > oo) such that Poo{p n (x n ), A') > r for all n. We use this fact to construct 
a profitable deviation from the policy p n , for sufficiently large n. 

Observe that by Assumption 4, there must exist a' n 6 A' with a' n < p n {x n ), such that: 

n(x n ,a' n ,f n ) -Tr(x n ,p n (x n ),f n ) > K,(\\a' n - fJ, n {%n)\\oo) > > 0, 

where the last inequality follows since k is strictly increasing with k(0) = 0. Importantly, note the bound 
on the right hand side is a constant, independent of n. 

Let xo, n = x n , and let x t ,n and a^ n denote the state and action sequence realized under p, n , starting from 
xo, n > under the kernel P(-|x, a, f n ). We consider a deviation from p n , where at time 0, instead of playing 
a o,n = Pn{x n ), the agent plays a' Q n = a' n ; and then at all times in the future, the agent follows the same 
actions as the original sequence, i.e., a' t n = a t)n . Let x' t n denote the resulting state sequence. 

Since the kernel is stochastically nondecreasing in a, and a' n < a n , it follows that there exists a 
common probability space together with increments £o,m£o n , such that £o,n ~ QH^n, a n , f n ), i' 0n ~ 
Q(-|x n , a' n , f n ), and £q„ < £o,n almost surely. Thus we can couple together xi >n and x' ln , by letting 
%i,n = ^n+£o,n> and x\ n = x n +^ n . In particular, observe that with these definitions we have x\, n > x\ n . 
Let A n = £o,« — £o n — 0- Note that HA^oo < 2M, by Assumption 1 (bounded increments). 
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Next, it follows from Lemma 5 that there exists a probability space with random variables £i,n)£i n 
such that £i, n ~ Q(-\xi, n , ai, n , f n ) and ~ Q(-|x' l n , ai, n , / n ), £i, n < £i, n almost surely, and yet 
x i,n + £i,n > x 'i n + ?i n almost surely. Thus we can couple together X2, n and x 2 n , by letting X2,n = 
#i,n + £l,n> and let x 2 n = n + ^ n . Proceeding inductively, it can be shown that there exists a joint 
probability measure under which < x tjn — x' tn < A n , almost surely, for all f > 1 (where the inequalities 
are interpreted coordinatewise); this follows by a standard application of the Kolmogorov extension theorem. 

We now compare the payoffs obtained under these two sequences. We have: 



E 



\ S ^P t {n(x t) n 1 at,n 1 fn) - K(x't, n , c4, n , f n )) = 7r(x n , fJ, n (x n ), fn) - n(x n , a' n , f n ) 



+ E 



,ni a t,ni fn 



< -k{t) +E 



V]/3* sup sup(7r(x fjn , a, /) - 7r(x i>n - 5, a,/)) 

t >l 5>0:||<5|| oo <2M a,/ 



Since increments are bounded (Assumption 4), in time t, the maximum distance the state could have 
moved in each coordinate from the initial state x is bounded by tM. Thus if xo, n = x n , then: 

sup sup(7r(x t ,„,a,/)-7r(x t ,„-(5, a,/)) < sup sup(7r(x n +e, a, f)-n(x n +e-5, a, /)). 

5>0:||5||oo<2ilf a,f 5>0,e:|[i5||oo<2M, a,f 

||e||oo<tM 

Let A t>n denote the right hand side of the preceding equation; note that this is a deterministic quantity, and 
that the supremum is over a finite set. Thus from Assumption 4, we have lim sup n _ i>00 A t:U < 0. 

Finally, observe that since limsup|| :I .|| oo _ >00 sup a j(7r(x + 5, a, f) — ir(x, a, /)) < 0, it follows that: 

sup sup(7r(y, a, /) - ir(y - S, a, /)) < oo. 

2/GZ^.,(5>0:||<5[|oo<2M a,f 

We denote the left hand side of the preceding inequality by D. Note that this is a constant independent of n. 
Combining our arguments, we have that for all sufficiently large n, there holds: 



E 



,ni ^t,ru fn fn)) 



t=i 



P T D_ 
P' 



By taking T sufficiently large, we can ensure that the last term on the right hand side is strictly less than 
re(r)/2; and by then taking n sufficiently large, we can ensure that the second term on the right hand side is 
also strictly less than n{r)/2. Thus for sufficiently large n, we conclude that the left hand side is negative — 
contradicting optimality of /%. The lemma follows. ■ 



Lemma 7. There exists e > and K such that for all i and all x with xg > K, 
supj sup Me7 , {/) Y,z e ZiQefa I x , MO), /) < -e. 
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Proof. Fix e > so that for all £ and all x' with x' £ > K', sup a / 6v4 / sup^- ^ zeQe(ze\x f , a', f) < —e; such a 
constant exists by the last part of Assumption 4. Observe that since A is compact and sup j ^2 Ze zeQe(zi \ x, a, 
is continuous in a (Assumption 4), it follows that sup^ ^ Z£Q^(z£ | x, a, /) is in fact uniformly continuous 
in a G A. Let denote the Fth standard basis vector (i.e., ei, = for 7^ and ei^ = 1). By uniform 
continuity, we can conclude there must exist a 5e > such that if \\a — a'||oo < Sg, then: 



sup 2^ Z£ 



Q £ (z e \K'eW,a,f)-su V 
f 



z e Q e (z e \K'e e ,a',f) 



< e/2. 



Note in particular, if p 00 (a,A / ) < Sg, then there exists a' G A' with ||a — a'||oo < By our choice 
of e we have supj Ylzf, z eQi( z i \K'e^ ,a, f) < — |. Now let 5 = min{<5i, . . . , 5^}. Since the incre- 
ment kernel is stochastically nonincreasing in x, it follows that if p 00 (a,A') < 5 and xg > K' , then 
SU P/E^ z eQe(z e \x,a,f) < -f. Since SUPj SUp^g-p^-j Poo (/j,(x),A') -4 as Halloo — >• 00, the result 
follows if we let e = e/2. ■ 



Lemma 8. For every f G $(/) is nonempty. 

Proof. As described in the discussion of Section 5.3, it suffices to show that the state Markov chain induced 
by an optimal oblivious strategy possesses at least one invariant distribution — i.e., that V(p, f) is nonempty, 
where /i is an optimal oblivious strategy given /. 

We first show that for every / and every p G V{f), the Markov chain on X induced by fi and / has at 
least one closed class. Let S = {x : ||x||oo < K + M}. By Lemma 7, if x £ S, then there exists some 
state x' with P(x'|x, fi(x),f) > such that x' t < xg — e for all £ where X£ > K. On the other hand, since 
increments are bounded, for any £ where X£ < K, we have x' e < K + M. Applying this fact inductively, 
we find that for any x S, there must exist a positive probability sequence of states from x to S; i.e., a 
sequence y , yi,y 2 , ...,y T such that y = x, y T G S, and P(y t \y t ^ 1 , n{y t -i), f) > for all t. We say that 
S is reachable from x. 

So now suppose the chain induced by [i and / has no closed class. Fix xq G S. Since the class 
containing xo is not closed, there must exist a state x' reachable from xq with positive probability, such that 
the chain never returns to xq stalling from x'. If x' G S, let x\ = x' . If x' G" S, then using the argument 
in the preceding paragraph, there must exist a state x\ G S reachable from x' . Arguing inductively, we can 
construct a sequence of states xq,x\,x 2 , ■ ■ ■ where xt G S for all t, and yet xq, . . . , xt-i are not reachable 
from x t . But S is finite, so at least one state must repeat in this sequence — contradicting the construction. 
We conclude that the chain must have at least one closed class. 

To complete the proof, we use a Foster-Lyapunov argument. Let U(x) = Then {x G X : 

U(x) < R} is finite for all R. Sonowletw = (2dKM+dM 2 + l)/(2t), and suppose |[x||oo > max{w,Z}. 
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We reason as follows: 



^2u(x')P(x'\x,fi(x),f) = U(x) + 2^2xg^2 ziQe{z e \x, fi(x), /) + ^2^2 zjQ e (z e \x, fj,(x), f) 

X 1 I Z£ t Zl 

< U{x) + 2 Mx i ~ 2 E Ixe + dM2 - U ( x ) ~ L 

l:x t <K i:x e >~K 

The first equality follows by definition of Q and U, and multiplicative separability of Q. The next step 
follows since increments are bounded (Assumption 4), and by applying Lemma 7 for X£ > K. The last 
inequality follows from the fact that the state space is d-dimensional, ||a;||oo > max{K, u}, and by definition 
of oj. Since increments are bounded, it is trivial that for every R: 



sup , (jl(x), f) — U(x) < oo. 

M|oo<R V X' ) 



It follows by the Foster-Lyapunov criterion that every closed class of the Markov chain induced by fi is 
positive recurrent, as required (Hajek 1982, Meyn and Tweedie 1993, Glynn and Zeevi 2006). ■ 

Lemma 9. For every rj £ Z+, supj sup^ e $(j) Yl x ||cc||^0(a;) < oo. 

Proof. We again use a Foster-Lyapunov argument. We proceed by induction; the claim is clearly true if 
rj = 0. So assume the claim is true up to r/ — 1; in particular, define: 

afc = sup sup 2_] \\x\\^.(p(x) 
/ x 

for k = 0, . . . , rj — 1. Fix /, and let fj, € V(f) be an optimal oblivious strategy given /. The preceding 
lemma establishes that the Markov chain induced by fj, possesses at least one invariant distribution. Let 

U (x) = J2e x e + • Then we have: 

U(x')P(x'\ x, /) = J2 E^ + I x, /i(x), /) 

x' £ z e 

r)+l 



EEEC 7 ! 1 ) ^'"w* i x > 

U (x) + (77 + 1) ^2 x v £ ^ zeQt(zt I x, n(x), f) 



l z t 

7,-1 



+ E E E ( v X l ) 44 +1 ~ k ^ 1 x, k*)J)- 

e z e k=o ^ ' 



Define g(x) as: 



^) = EC ? tV +1_fc E^ 
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By the inductive hypothesis, 



7 = sup sup y g(x)(j)(x) < oo. 



Further, by Lemma 7, for all £ and all x such that X£ > K, we have: 

^2z e Qi(z£ | x,fi(x)J) < -e. 



Define as: 



It follows that: 



h(x) = -(r/ + l)M xjf + efa + l) 



V 

X o . 



£ (x')P(x' I /i(x), /) - C/(x) < -/i(x) + g(x). 



Now fix any distribution <p G ^(a*> /)■ Since the Markov chain induced by // and / must be irreducible on 
the support of 4>, it follows by the Foster-Lyapunov criterion (Meyn and Tweedie 1993) that: 

Rearranging terms, we conclude that: 

x \tx t >K J 

Thus: 



(Recall that the sum is only over x € Z+.) Since the right hand side is finite and independent of / and <fi, 
the result follows. ■ 

Proof of Proposition 9. We have already established that $(/) is nonempty for all / € # in Lemma 8. 
Define B = sup^ sup0 6<I> (j) Y^ x . ||x||p£|^>(x) < oo, where the inequality is the result of Lemma 9. 

We define the set £ = j/ G # : — By the preceding observation, C £. 

It is clear that <£ is nonempty and convex. It remains to be shown that £ is compact in the 1-p-norm. 
It is straightforward to check that £ is complete; we show that £ is totally bounded, thus establishing 
compactness. 

Fix e > 0. Choose K t so that B/K e < e. Then for all / € £: 

£ IM|f/(s) < < e- (19) 

■ 1 1 1 1 oo ^ -^"e 
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Let S e = {x : \\x\\oo < K € } and let ©c be the projection of <£ onto S e ; i.e., 

6 C = { g e M 5e : 3 / G £ with g(x) = f(x)\t x G S e }. 

It is straightforward to check that ©c is a compact subset of the finite-dimensional space E e ; so let 
fi, ■ ■ ■ > fk £ ©C be a e-cover of 6c (i.e., ©c is covered by the balls around fx, - ■ ■ ,fk of radius e in 
the 1-p-norm). Then it follows that /i, . . . , is a 2e-cover of <£, since (19) bounds the tail of any / G £ by 
e. This establishes that (J is totally bounded in the 1-p-norm, as required. ■ 

E.4 Finite Actions 

We conclude by briefly discussing how the proof of Proposition 9 may be adapted in the case of finite action 
spaces (cf. Definition 9). Suppose that S C is a finite set. We now show that as long as Assumption 4 
holds with respect to pure actions — i.e., with A replaced by 5 — Proposition 9 continues to hold. 

Lemma 5 follows as before, except with A replaced by S. Lemma 6 follows the same argument if we 
restrict attention to pure strategies fi, i.e., strategies that take a pure action in every state. Let V(f) denote 
the set of optimal pure strategies given /. Then Lemma 6 then yields that as ||3?||oo — ^ 

sup sup p QO (fi(x), A') — » 0. 

Since A' C S, it is finite as well. It follows that there exists £ such that for x such that H^Hoo > £> for all 
/, and all \i € V(f), we have fj,(x) G A'. From this and Assumption 4 the result of Lemma 7 holds for 
[i G P(f), i-e-, there exists e > such that for all £ and all x with xi > K' , 

sup sup ^2ztQe(zt | x,fi(x),f) < -e. 

To complete our proof, we need only note that the set of all optimal oblivious strategies V(f) can be 
obtained by pointwise convex combinations of optimal pure oblivious strategies; this follows from Bellman's 
equation and the fact that the payoff is linear in the mixed action. Thus we also have: 

sup sup 'S^zeQiizi | x,n(x),f) < -e. 
The remainder of the proof follows as before. 

F AME: Proof 

Throughout this section we suppose Assumption 1 holds. We begin by defining the following sets. 
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Definition 11. For every x G X, define 



X x = [z G X P(x | z, a, f) > Ofor some a £ A and for some f G 5p j 



(20) 



Also define X x>t as 



X : , 



x,t 



zeX 



Moo < Moo +*^}. 



(21) 



Thus, X x is the set of all initial states that can result in the final state as x. Since the increments are 
bounded (Assumption 1), for every x G X, the set X x is finite. The set X Xjt is a superset of all possible 
states that can be reached at time t starting from state x (since the increments are uniformly bounded over 
action a and distribution /); note that X x j is finite as well. 

The following key lemma establishes that as the number of players grows large, the population empirical 
distribution in a game with finitely many players approaches the limiting SE population. The result is 
similar in spirit to related results on mean field limits of interacting particle systems, cf. Sznitman (1991); 
there the main insight is that, under appropriate conditions, the stochastic evolution of a finite-dimensional 
interacting particle system approaches the deterministic mean field limit over finite time horizons. Our 
model introduces two sources of complexity. First, agents' state transitions are coupled, so the population 
state Markov process is not simply the aggregation of independent agent state dynamics. Second, our state 
space is unbounded, so additional care is required to ensure the tail of the population state distribution is 
controlled in games with a large but finite number of players. This is where the light tail condition plays a 
key role. Our proof proceeds by induction over time periods. 

Lemma 10. Let (/i, /) be a stationary equilibrium with f G $ p . Consider an m-player game. Let xffi = 
xq and suppose the initial state of every player ( other than player i ) is independently sampled from the 
distribution f. That is, suppose x^q ~ f for all j ^ i; let /( m ) £ g( m ) denote the initial population state. 



Let af? be any sequence of (possibly random, possibly history dependent) actions. Suppose players' states 



evolve as x 



(m) 



(m) (m) 



'"i,t+l '~ x V ' I *i,t ' u i,t iJ-i,t) 
Then, for every initial state Xq, for all times t, 



an 



dfor all j 7^ i, as x^ +1 



r ( m ) 



Jm) 
J -i,t 



f 



almost surely as m 



l-p 



Hix 
> oo 



j,t > 



Proof. Note that / G $ p and hence 
that: 



|j < oo. Thus, given any e > 0, there exists a finite set C e j such 



\\ x \\ p pf( x ) < e - 



x4C, 



(22) 



At t = 0, we have 



1 m— 1 

/saw - E' 



{X jt0 =x}i 



Note that the convergence is almost surely in the randomness associated with the initial population state. 
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where Xj t o are i.i.d random variables distributed according to the distribution /. Define: 



Y j ~ H x j-o||pl{x,-o0C ei/ }- 



Note that the Yj are i.i.d. random variables, with: 



m}= E imis/c*)- 

x£C eJ 



Further, observe that: 



E «/ 



m— 1 



-i,0V 



■/?? 



x£C (J j=l 

Thus by the strong law of large numbers, almost surely as m — > oo, 



E imi;/SS(*) 



E \\ x \\pf( x ) < e - 



x<£C e j 



Now observe that: 



r(m) 



1-p 



|z||»/(a:) 



Each of the second and third terms on the right hand side is almost surely less than e for sufficiently large 
m. For the first term, observe that \f^ (x) — f(x)\ — > almost surely, again by the strong law of large 
numbers (since f( m \x) is the sample average of m — 1 Bernoulli random variables with parameter f(x)). 
Thus the first term approaches zero almost surely as m — > oo by the bounded convergence theorem. Since 



e was arbitrary, this proves that 



Am) 
J-i,0 



f 



1-p 



almost surely as m 



We now use an induction argument; let us assume that, 
for all times r <t. From the definition of we have: 



(m) 



/ 



oo. 



1-p 



almost surely as m — > oo 



=!/}' 



where x 



(m) 



/j,(x^), f^™\) for all j ^ i Note that if two players have same initial state, 



then = /i;) (2/) 



then the population state from their viewpoint is identical. That is, if Xj t 
for all y € X. We can thus redefine the population state from the viewpoint of a player at a particular 
state. Let f^ x,m ^ be the the population state at time t from the viewpoint of a player at state x. Then, if 



(m) 
C J,t 



n ( m ) 



x, then for all y G X, f { ™\(y) 



f-k\(y) = ft X ' TH> (y)- Without loss of generality, we 
assume m > 1. Let rfi!l' t {x) be the total number of players (excluding player i) that have their state at time 
(m — l)f^™\(x). Note that rj^\(x) = if and only f _^ t {x) = 0. We can now write 



f{x,m) 



t as x, i.e., t] 



(m) , 
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/S +1 (.)~E E 1 

xdX j=l 

= E 



x&X 



1 



E f { tw 



E i 



fftt^J i=i 



(m) 



(23) 



where the last equality follows from the Definition 11. Here, Y> x t are random variables that are indepen- 
dently drawn according to the transition kernel P(- | x, fi(x), fj^ x ' m ^_ Note that if rj^\(x) = 0, we interpret 
the term inside the parentheses as zero. 
Let us now look at f( x,m \ We have 



(m) 



1 



m 



t{z=x}- 



Consider 



fi X ' m) -f\\. -We have: 



\i-p 



?(x,m) n 
It ~ J 



z£X 



, = £ Ml /t ( *' m, W - /w 



m — 1 l 1 !™'- 2 ) m 

< E ii'ii? |/*&*) - m\ + E ii-n? v?-) + E inh? 1 ^} 



^-l{ 2 =x} - /(*) 



zGA" 



2GA" 



, + — — r E H z IIp 1 ir( m >- z -v + ~~T E II^IIp 1 { 2=a; } 

1-p m — 1 ^—f. ^ - z / 771—1 ^— ' p 



From the induction hypothesis, we have 



z&X 



z<=X 



1-p 



almost surely as m — > oo. Note that at time t, 



x 4 - 1 G ^xo.i fr° m equation (21), and X XCh t is finite. Thus, 

sup NJ IMIplr (m)_ i < oo, almost surely. 

m <- x i t ~ Z J 



z&X 



This implies that for all states x G X, 



r(x,m) j, 

h ~ J 



1-p 



almost surely as m — > oo. From Assumption 1, 



we know that the transition kernel is continuous in the population state / (where X, is endowed with the 1-p 
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norm). Thus for every x € X , we have almost surely: 



P(-\x, f i(x),ff' m) )^P(-\x,^x)J), 



(24) 



as m — > oo. 

Next, we show that — > f(y) almost surely as m — > oo, for all y. We leverage equation (23). 
Observe that the set of points x € X where < 1 is finite, since X is a subset of an integer lattice. From 
the induction hypothesis, as YlxeX IMIpl/^i \( x ) ~ f( x )\ ~ > almost surely as m — > oo, it follows that 
f^Li\(x) —± f(x) almost surely for all x € X as x — > oo. 

Suppose that x € X y and f(x) > 0. Since f^\(x) — ¥ f(x), it follows that r]^f t — > oo as m — > oo, 
almost surely. Note that are random variables that are independently drawn according to the transition 
kernel P(- \x, fj,(x) , f( x ' m ^y From equation (24), and Lemma 1 1, we get that for every x, y € X, there holds 



1 



E h 



P(y\x,(i(x),f), 



almost surely as m — > oo. 



On the other hand, suppose x S X y and f(x) = 0. Again, since f^\(x) —¥ f(x) as x — >• oo, it follows 
that as m — > oo, almost surely: 



^ E 



rj-i,t\ x ) j=i 



since the term in brackets is nonnegative and bounded. (Recall we interpret the term in brackets as zero if 
We conclude that, almost surely, as m — > oo: 



fSUv) = E 

X&Xy 



E i 



(m) / s t < 

IlitW 3=1 



£ f(x)P(y\x,fi(x)J) = f(y). 

X&Xy 



To complete the proof, we need to show that 



Am) _ , 
J-i,t+l J 



f _X{x) f(x) almost surely, for all e > we have: 



l-P 



almost surely as m —> oo. Since 
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This together with the fact that \\f^™\ — f\\i-p — > implies that, almost surely: 



limsup Y, IMI?/-7tO«0 < e - < 25) 

Now at time t + 1, we have 

E «/33+i= E Ew/^U*) 

< E E(m+ m ) p /-m(*)> 

where the equality follows because X is a subset of the d-dimensional integer lattice. The last inequality 
follows from the fact that the increments are bounded (Assumption 1). Without loss of generality, assume 
that Ix^l > 1 and that M > 1. Then we have: 



<E^)i^iw 



r- 

2PMP\x e \ p = K^xelP, 



where we let K\ = (2M) P . Substituting in equation (26), we have, almost surely, 

Hmsup £ \\x\\lf^ +1 < £ £jW/.<"° 



m— i-oo , . . 



i,t\ X ) 



=i<i e ihij/w 

<#ie, 

where the last inequality follows from equation (25). Now observe that: 



Am) _ , 



< E M\i\t™UM-m\+ E «/5U*) + E «/(*)• 



In taking a limsup on the left hand side, the second term on the right hand side is almost surely less than K\e. 
From the definition of C e j and equation (22), we get that the third term on the right hand side is also less 
than e. Finally, since for every x \fl^ t+1 (x) — f(x)\ — > almost surely as m — > oo, and C e j is finite, the 
first term in the above equation approaches zero almost surely as m — > oo by the Bounded Convergence 
Theorem. Since e was arbitrary, this proves the induction step and hence the lemma. ■ 
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The preceding proof uses the following refinement of the strong law of large numbers. 



(k) 

Lemma 11. Suppose < p^ < 1/or all k, and that p^ — > p as k — > oo. For each k, letY^ , . . . , Y" fc be 
i.i.d. Bernoulli random variables with parameter p^. Then almost surely: 



r(k) 



i k 

I V Y {k) 



lim - > 17 = p. 

i=l 



Proof. Let e > 0. By Hoeffding's inequality, we have: 



Prob 



k^ 1 



(*) 



i=l 



> e 



since < < 1 for all i, k. Let = 1/fc; then by the Borel-Cantelli lemma, the event on the left 
hand side in the preceding expression occurs for only finitely many k, almost surely. In other words, almost 
surely: 



lim 

k— >oo 



Pk 



1 K 

t=i 



(k) 



0. 



The result follows. 



Before we prove the AME property, we need some additional notation. Let (p,, /) be a stationary equi- 
librium. Consider again an m player game and focus on player i. Let xf^ = xq and assume that player i 
uses a cognizant strategy p, m . The initial state of every other player j > ^ i is independently drawn from the 
distribution /, that is, x^q ~ /. Denote the initial distribution of all m — 1 players (excluding player i) by 
f( m ) £ g( m )_ The state evolution of player i is given by 



(m) . , 

X i,t+1 ~ P 



(in) (m) ,(m) 



i.f 



(27) 



where a\'l' = p m [x(^\ /^ t ) and / ^ is the actual population distribution. Here the superscript m on 
the state variable represents the fact that we are considering an m player stochastic game. Let every other 
player j use the oblivious strategy p, and thus their state evolution is given by 



e (m) 



x 



(m) 



(28) 



Define y( m ) (x, \ fj, m , /^( m_1 )) to be the actual value function of player i, with its initial state x, the 
initial distribution of the rest of the population as /( m ) S , when the player uses a cognizant strategy p m 
and every other player uses an oblivious strategy /i. We have 



oo 

vW{x,fW | n m ,» {m - l) ) = e[£ | Xi , = Xj% = /M ; 



t=0 



Mi — Mm) M- 



A* 



(m-1) 



■ (29) 
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We define a new player that is coupled to player i in the m player stochastic games defined above. We 
call this player the coupled player. Let x[f be the state of this coupled player at time t. The subscript i and 
the superscript m reflect the fact that this player is coupled to player i in an m player stochastic game. We 
assume that the state evolution of this player is given by: 

jgM _ . n( . i /?.( m ) sW 



&~P(. |*™, a™, /), (30) 



where affl = af^ = \i m (xfft , f -it) ■ ^ n other words, this coupled player takes the same action as player i 
at every time t and this action depends on the actual population state of m — 1 players. However, note that 
the state evolution is dependent only on the mean field population state /. Let us define 



(31) 



Thus, V"( m )(x | /; fi m , pS) is the expected net present value of this coupled player, when the player's initial 

m) 
-i,0 



state is x, the long run average population state is /, and the initial population state is f_^ = f( m \ Observe 



that 

V [m) (x | /;/i m ,M (m - 1) ) < ™p T/( m )(x | /; M / ,/i (m - 1) ) = sup y( m )(x | /; //, /i^ m_1 ^) 

= y*(x|/) = y(x|^,/). (32) 

Here, the first equality follows from Lemma 2, which implies that the supremum over all cognizant strategies 
is the same as the supremum over oblivious strategies (since the state evolution of other players does not 
affect the payoff of this coupled player), and the last equality follows since pi G V(f). 

Lemma 12. Let (//, /) be a stationary equilibrium and consider an m player game. Let the initial state of 
player i be xf^ 1 = x, and let € 3^™) be the initial population state ofm—1 players whose initial state 
is sampled independently from the distribution f. Assume that player i uses a cognizant strategy p, m and 
every other player uses the oblivious strategy fi. Their state evolutions are given by equation (27) and (28). 
Also define a coupled player with initial state xf^ = x and let its state evolution be given by equation (30). 

► 0, almost 



Then, for all times t, and for every y € X, we have 
surely 11 as m — > oo. 



Prob (x![t = y) — Prob (xffl = y) 
Proof. The lemma is trivially true for t = 0. Let us assume that it holds for all times r = 0, — 1. 



The almost sure convergence of the probabilities is in the randomness associated with the initial population state. 
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Then, we have 



Prob ( x 



Am) 



-v) = Y, Prob H?i = z ) p (y *» M*. /5Li 

Prob = y) = ^ Prob = z) P (y | z, Mm (,, /l^L), /) • 



Here we use the fact that the coupled player uses the same action as player i and the state evolution of the 
coupled player is given by equation (30). Note that the summation is over all states in the finite set X y , 
where X y is defined as in equation (20). 



From Lemma 10, we know that for all times t, 



Am) 
J-i,t 



f 



l-p 



almost surely as m — > oo. From As- 



sumption 1, we know that the transition kernel is jointly continuous in the action a and distribution / (where 
the set of distributions $ p is endowed with l-p norm). Since the action set is compact, this implies that for all 



y, z € X, lim m _> 
every y, z G X, Hindoo 
surely. From the induction hypothesis, we know that for every z G X, 



P(y 
Pi 



Z LL (Z f (m) ) f (m) 
z, H-m\z, J _ it _i), J _ it _i 



0. almost surely. It follows that for 



y 



Prob (x 



(m) 



Prob (x 4 1 



almost 

M _ . 



almost surely as m — > oo. This along with the finiteness of the set X y , gives that for every y € X 



Prob (: 



y) - Prob (x\y = y) 



almost surely as m — > oo. This proves the lemma. 



Lemma 13. Let (fi, f) be a stationary equilibrium and consider an m player game. Let the initial state of 
player i be xf^ = x, and let y( m ) g j( m ) be the initial population state ofm — 1 players whose initial state 
is sampled independently from the distribution f. Assume that player i uses a cognizant strategy [i m and 
every other player uses the oblivious strategy fi. Their state evolutions are given by equation (27) and (28). 



Also define a coupled player with initial state x 



(m) 
i,0 



x and let its state evolution be given by equation (30). 



Then, for all times t, we have lim sup^ 
0, almost surely 



E 



71 \ x i,t > J -i,t) ' J 



12 



(m) 



Proof. Let us write a™ — fj, m {x™ t ,f -i' t ). We have 



(m) ,(m)< 



A 



(m) 
i,t 



E 
E 



7T X 



(m) Am) 
J —i 



(m) im) _e(m) 



i.t 



_A rp(m) rp(m) 

— 1 l,t ~r J ? 



-2,t 



( ~(m) (m) n 



+ E 



(m) (m) j. 



~ (m) (m) /. 



i,i 



Consider the first term. We have 

12 The almost sure convergence of the expected value of the payoff is in the randomness associated with the initial population 
state. 
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rp{m) < , 

J l,t ^ 



V Prob ( a; iT ) = y) su p ^ (2/: °: /^i 1 ) - 71 (y> a > /) 

ae.4 v 



V Prob (xff = 2/) sup vr ( y, a, /^l ) - vr (y, a, /) 



where the last equality follows from the fact that xf^ = x and from equation (21). From Assump- 
tion 1 , we know that the payoff is jointly continuous in action a and distribution / (with the set of dis- 
tributions $p endowed with 1-p norm) and the set A is compact. Thus, for every y € X, we have 



rp(m) 

1 ht 



K vr (y, a, /H t ) - vr (y, a, f) 
finite shows that lim sup m _> 

Now consider the second term. We have 



0, almost surely as m 
< almost surely. 



00. This along with the fact that X x t is 



-2,t 



E 



/ Om) (m) p 
[* { x i,t ' a i,t J 



< ^2 | Pr ° b 

= I Prob 



,M 



y) 



- (m) (m) j, 
X i,t i%t J 

(m) _ 



Prob (x 



y) — Prob (. 



i( m ) 



y) sup \n(y,a,f)\ 
= y) sup \ir(y,a,f)\ , 



where the last equality follows from the fact that x\ 
we know that for every y € X, 



( m ) _ ™M 

x i,0 



x and from Definition 1 1 . From Lemma 12, 



Prob (x 



(m) 



almost surely m — > 00. Since 



"m - v) - Prob (C* = y) 

X x j is finite for every fixed x € A? and every time i, this implies that lim sup m ^ OC) Trffl < almost surely. 
This proves the lemma. ■ 

Before we proceed further, we need one additional piece of notation. Once again let (/i, /) be a stationary 
equilibrium and consider an oblivious player. Let x t be the state of this oblivious player at time t. We assume 
that xq = x and since the player used the oblivious strategy \l, the state evolution of this player is given by 



Xt+l 



xt,a t j) 



(33) 



where at = n{xt)- We let V(x \ fj,, /) (as defined in equation (7)) to be the oblivious value function for this 
player starting from state x. 

Also, consider an m player game and focus on player i. We represent the state of player i at time t by 
As before, the superscript m on the state variable represents the fact that we are considering an m 



„(m) 
x i,t ■ 



player stochastic game. Let xf^ = x and let player i also use the oblivious strategy \i. The initial state 
of every other player j i is drawn independently from the distribution /, that is, Xj g ~ /• Denote the 
initial distribution of all m — 1 players (excluding player i) by f( m ^ € The state evolution of player i 
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is then given by 



where af^ = ^{xff 1 ^. Note that even though the player uses an oblivious strategy, its state evolution is 
affected by the actual population state. Let every other player j also use the oblivious strategy p and let 
their state evolution be given by 



„(m) 
X j,t+1 



Define y( m ) (x, /( m ) | /i( m )) to be the actual value function of the player, when the initial state of the player 
is x, the initial population distribution is / ( m ) and every player uses the oblivious strategy /i. That is, 



E 



x ' J-i,0 



p(m). 



(m) 



i=0 



(36) 



(m) 
i,0 



Lemma 14. Lef (fi, /) fte a stationary equilibrium and consider an m player stochastic game. Let x 
and let /( m ) £ g"( m ) be the initial population state ofm — 1 players whose initial state is sampled indepen- 
dently from f. Assume that every player uses the oblivious strategy Li and their state evolutions are given by 
equations (34) and (35). Also, consider an oblivious player with xq = x and let its state evolution be given 



by equation (33). Then, for every time t and for all y G X, we have 
0, almost surely as m — > oo. 



Prob(xf = y) — Prob(i 



y) 



Proof. The lemma is trivially true for t = 0. Let us assume that it holds for all times r = 0, 1, • • • , t — 1. 
Then, we have 



Prob (xt = y) = Prob (xt-i = z) P 



>J 



Prob = y) = £ Prob (ij™^ = z) P (y *, ,*(*), /™ 



.(m) 



Note that the summation above is over all states in a finite set X v (as defined in Definition 1 1). 



From Lemma 10, we know that for all times t, 



l-p 



almost surely as m — > oo. From 



Assumption 1, we know that the transition kernel is continuous in the distribution (where the set of distribu- 



tions 3p is endowed with l-p norm). From the induction hypothesis, we know that 
0. This along with the fmiteness of the set X y , gives that for every x G X 



Prob (xt-i 



Prob 



Prob (xt = x) — Prob (x\ 



id 











almost surely as m — > oo. This proves the lemma. 
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Lemma 15. Let (/i, /) be a stationary equilibrium and consider an m player stochastic game. Let xf^ 1 = 
x, and let /( m ) g be the initial population state of m — 1 players whose initial state is sampled 

independently from f. Assume that every player uses the oblivious strategy and their state evolutions are 
given by equations (34) and (35). Also, consider an oblivious player with xq = x and let its state evolution 



be given by equation (33). Then for all times t, we have E 
0, almost surely as m — > oo. 

Proof. Define ^ as 



7r(x t , v(x t ), f) - tt(x\™ ] , n(x$), 



A? = E 
= E 



7r(x t , fJ,(x t ), f) - tt{x\™\ n{x f { ™\) 



7T 



(x t ,fi(x t ),f) -n(xt,v{xt),r™l) + E ^( £ ti^(^)>/^t) -7r (*i? ; 'M(*i]r ; )'/^t) 



A T^( m ) I 7i( m ) 
— 1 l,t 1 2,t ■ 



Note that from Lemma 10, we have that 



/S - / 



1-p 



almost surely as m — > oo. From Assump- 



tion 1, we know that the payoff is continuous in the distribution, where the set of distributions $ p is endowed 
with 1-p norm. Thus, for every y and a, we have 



7r(y, a, /) - vr(y, a, /^) 



0. 



(37) 



as m — > oo. Consider the first term. We have: 

-,(m) 



ye* 



^ Prob(x t = y) 7r(y, fi(y), f) - %{y,p,{y), p™} ' 



where the last equality follows from the fact that xq = x and from Definition 11. Since X Xjt is a finite set 
for every initial state x € X and every time t, we get that T^' — > almost surely as m — > oo. 
Consider now the second term. We have: 



# = E 



^2 Prob (x t = y)n(y, n(y), /^t) ~ Yl Prob (**V? = Mi /5,t) 



¥,( m ) 



»(m) 1 



^ (Prob (£ f = y) - Prob (x\J> = y) \ n(y, p,(y), f_i it ). 
y&Xt 



i( m ) 



From Lemma 14, equation (37), and the finiteness of X x>t , we get that lim sup m _ ) . 0O TJ ) t < almost surely. 
This proves the lemma. ■ 

Proof of Theorem 2. Let us define 



AV im) (x,f {m) ) = V {m) (x,f {m) | /x m ,/x (m " 1) ) -V {m) (x,f^ |/x (m) ). 
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Then we need to show that for all x, limsup m ^. 0O AV^ (x, f^) < almost surely. We can write 

A^W^jH) = y(m)( x j(m) | ^^C™"!)) _ y( x | ^f) + V(x \ fi, f) — V^(x,f^ \ fl^) 

< y(m) ( Sj | _ y(m) ^ | /; ^ + y (x | ^ /} _ y (m) / (m) | ^(m)) 



— 1 l + J 2 • 



(m) 



Here the inequality follows from equation (32). Consider the term T x . We have 



E 



t=0 



(m) ~ (m) 



where the last equality follows from equations (29) and (31). Note that a^o = Xi,o = % and a^t = 
ai,t = Hm{xi,t-> f -i t) an( ^ tne state transitions of players are given by equations (27), (28), and (30). From 

< 0, almost 



Lemma 13, we have limsup^^^E 
surely for any finite time T. From Lemma 1, we have, almost surely 



(m) ~(m) 



E 



oo 



t=T 



< 2C(x,T), 



(m) 



which goes to zero as T — > oo. This proves that lim sup fn _ i , 00 T x < almost surely. Similar analysis 
(with an application of Lemma 15) shows that lim sup rfWOO T^ m ' < almost surely, yielding the result. ■ 

Proof of Theorem 3. Similar to the proof of Theorem 2, let us define 

AV [m Hx,f^)^V {m) (x,f im) | /x m ,/x (m - 1) ) -V im) (x,f^ |^ (m) ). 
Then we need to show that for all x, limsup m _ s . 00 AV^ (x, f^) < almost surely. We can write 

A y(m) fa / M ) = y(m) fa f (m) | ^ _ y {x | ^ f) + y {x | ^ /} _ y (m) /M | ^H) 

< fa f (m) | ^ ^(m-1)) _ y(m) ^ | /; ^ ^(m-1)) + ^ | ^ /} _ y (m) ^ f (m) 

A T->( m ) i T^( m ) 

— 1 1 i 1 2 > 



where y( m ) is defined as in (31) (and in particular, using the limit profit function ir). Here the inequality 
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follows from equation (32). Consider the term T^ m \ We have 



E 



(m) (m) 



/~(m) ~(m) j. 



i=0 



where the last equality follows from equation (29) (with ir replaced by 7r m ) and equation (31). Note that 
%i,o = Xifi = x and a^t = a^t = fJ,m{%i,t, f^-l\) an d the state transitions of players are given by equa- 
tions (27), (28), and (30). Now, 



5> * 



t=0 



(m) (m) n(m) 



Using equicontinuity and the uniform growth rate bound, a similar argument to the proof of Theorem 2 (via 
Lemmas 13 and 15) shows that: 



limsup E 

m— >oo 



(m) ~ (m) 



t=0 



< 0, 



almost surely. Recall that, for all x,a,f, lim m _ i . 00 7r m (x, a, /) = n(x,a,f). Since A is compact and 



increments are bounded, it follows that 7r. m (x \™ , aft ' f) — ^{^i't '> /) ~ ^ almost surely as 
oo, for all times i. Using the fact that increments are bounded, the uniform growth rate bound, and the 
dominated convergence theorem, the expectation of the preceding difference also approaches zero almost 
surely. Finally, by truncating the sum at time T, an argument similar to the proof of Theorem 2 gives: 



(m) ~(m) 



m 



limsup E 



oo 
,t=0 



< 0. 



This proves that limsup^j^oo T^ 1 < almost surely. Similar analysis shows that limsup m ^ 00 
almost surely, yielding the result. ■ 
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